What to Consider When Choosing Storage?
There are a plethora of options for storage nowadays. SSD's, Hard Drives, PCIE, SATA3, SAS, SCSI, NVME, M.2, there are all kind of options and things to consider, so what should you really be looking at? I'll run you through the various performance metrics to look at to find the right storage for you.
First thing I will talk about is capacity as that is what most people are familiar with. Capacity is just the amount of data that you can store. This is usally marked in GB(gigabytes) or TB(terabytes). A terabyte is 1000 gigabytes in the hard drive world. So if you see some SSD with a capacity of 500GB then that is half of the capacity of a 1TB hard drive. When looking at capacity you just need to know roughly on what magnitude you plan to be storing away. If you will be backing up all of the video for your media company then you will need much more storage than someone just using some spreadsheets and doing web browsing. Figure out roughly what your need is and make sure you can fit all of your data, either on a single drive or if you have to, you can buy multiple drives. You just have to check to make sure your system has enough slots for additional drives. If you dont have additional drives slots then you can purchase an external NAS or drive.
Next thing on the list is longevity. It's an unfortunate truth that none of our storage options last forever. All of them fail eventually. What you have to consider is how long you plan on using a certain system because likely your next system will have bigger and faster storage so if you are planning on upgrading every couple years then it is not entirely neccesary to buy a drive that has a 5 year warranty. If you are instead, trying to archive your family photos, then you will really want to check out what you can expect from a certain type of drive. Recently SSD's have shown that they can last much longer than hard drives if properly maintained, but if you have a lot of data then they might not be able to store everything you have. If you really do need extreme longevity and capacity then you may even consider a tape based system. Some of these have a warranty of 30 years and capacities of 15+ TB's.
Another thing to consider is type of failure. Hard drives fail mechanically and often the platters inside of them are actually very recoverable given that the read-write head hasn't scratched them or they have been otherwise physically damaged. This means, while the drive may be dead, the data still has a chance to be recovered. With SSD's it is a little more difficult because the flash chips can't be manually read as easily as a hard drive can. Currently we don't have any guaranteed methods to restore your data so a SSD with a good warranty might last longer but it might fail in such a way that you cannot recover your data and if you can it probably won't be cheap.
Finally we can talk about performance. As always you need to figure out what your use case will be. The storage performance for just backing up your files at the end of the day requires something a lot different than one that will make your daily computing experience responsive and enjoyable. The two main things to consider when looking at their performance is their read-write throughput and their IOPS. Read throughput is how much data from the hard drive that it can read per second and write throughput is how much it can write per second. In some scenarios you might need to read the data more often then you will need to write from it and vice versa, so consider that when you are choosing storage. If you are planning out a nightly backup system and need to backup 8TB of data and your drive has a write throughput of 200MB/s then you might not be able to finish the entire backup in a night and may need to look for an alternate solution. On the other hand IOPS is the number of individual operations a storage device can do in a second. This a high number on here will make your Windows load faster, games load faster and when you do a search on your computers files it will help that happen more quickly. This is a very important number for day to day computers rather than just large file storing systems.
Now that you have heard many of the various performance metrics to judge storage by I hope you can make a more informed decision.
Hard Drives vs Solid State Drives
Before I get into the granular details I'll just come out and explain in simple terms, a hard drive in your computer is basically a stack of discs like what you used to put into a CD player. These discs are called platters and the internals of a hard drive reads and writes to them like you would when you burn a disc and then play it back. This is likely what you have storing all of the files on your computer.
A solid state drive or often called an SSD is a lot more like the flash drive you use to move documents from one computer to the other sometimes. This is made up of small memory chips that pass electricity through to retrieve the data. You might have one of these storing all of the files on your computer
So now that you understand on a basic level what a hard drive and solid state drive are we can talk a little bit more about what they do and how they differ. The first one I will cover is the hard drive. Like I explained earlier, a hard drive is basically a collection of discs stacked up all ready to be read and written to without needing to swap anything out. It stores data by magnetizing and demagnetizing specific areas. It is first split up into tracks which are the concentric paths around a platter. These are like the eight lanes on a running track, but on a hard drive there are many more than 8. Then the tracks are further split up into sectors.
Continuing with the running track analogy this is like the distance from the 0 to 100m being a sector and from the 100m to 200m being another sector. With this organization system it is much easier for the hard drive to figure out where things are because it can be directed to the correct track and sector and then look in that specific area for the data. Just like a runner on a track it takes the platters a little bit to process and start spinning and get to the destination. This delay before any data can be transferred is caused by a few things. First the read-write head must move to the correct track on the hard drive. Then the platter must be spun to the correct sector to grab the data. The time it take for the read-write head to move to the correct track is called the seek time and the time it takes on average to get to the correct sector is called the rotational latency. This delay means that it is not as good at handling lots of random bits of data but it is good at dealing with continuous writes and reads. This is why it is very beneficial to defragment your hard drive every once in a while. It organizes your data into a less random layout so that the platters can be read faster without needing to do as much track seeking and sector searching.
The measurement of how many random operations a storage device can do is usually measured in IOPS(input/output operations per second). As stated, these are relatively low for hard drives, they likely will not exceed a couple hundred IOPS. On the flipside, their throughput, the measure of their sustained reads and writes is pretty good. This means that if you are doing a large backup and just copying all of the files over the drive can just go from track to track copying everything relatively quickly. These are measured in MB/s(megabytes per second). For hard drives, these will also likely top out somewhere a little bit above the 200 MB/s range. So they have low IOPS and decent Throughput, so why does everyone still use them? The main reason is that they are still kings when it comes to capacity and cost. Capacities go all the way up to 12TB and larger is still on the roadmap. While cramming 12TB in a single drive they are still relatively affordable. A 4TB drive can be had for roughly $100 USD and for many people this is way more storage than they will ever need. A common metric people look at is price per GB(gigabyte). For hard drives like the 4TB at $100 it is a very impressive $.025/GB.
Now that we have thoroughly covered hard drives we can move onto SSD's. The biggest advantage SSD's have is speed. Every which way you look at it, SSD's are faster. Their IOPS are off the charts in comparison to hard drives. While hard drives were in the range of a couple hundred, SSD's are in the neighborhood of over 100,000 for some of the faster ones. This is partially because they do not have to spend any time searching tracks or changing sectors. There are no moving parts, it is simply the time it takes to process the request and the electricity to find the right part. At the same time, the throughput is very high. These devices are so fast that they have maxed out the capacity of the interface used to traditionally connect hard drives and some have moved to slots directly on the motherboard.
The fastest SSD's can push a throughput of 2000+ MB/s. Ten times faster than the fastest hard drives. Another advantage of SSD's is their heartines. Because they have no moving parts like a hard drive has, they can endure more vibration, impact, pressure and temperature. They are incredibly reliable devices now that can last 10+ years while many hard drives will not make it past 5. The SSD's downfall, at least for now, is the price per GB is relatively high and they do not come in sizes comparable to hard drives without costing an arm and a leg. You can expect a single 1TB SSD to cost over $200 even if you get a great deal meaning that the SSD's GB/$ rate is 8 times that of the 4 TB hard drive.
So in summary, SSD's are the pure performance king but currently cost quite a bit more per gigabyte so they are mostly best for applications where extreme speeds are needed. Hard drives are still widespread and have a legitimate space in many applications. Any application where lots of storage is eaten up like video and audio is a good place to use hard drives instead of SSD's if you cannot swing the budget for them.
What is RAID and why would I use it?
You may have read on forums or seen somewhere mentioning RAID 0 when bragging about their system, but not really known what it was about. RAID is an acronym that stands for Redundant Array of Inexpensive Disks. At the heart of it, RAIDing drives is something that is done to use multiple hard drives, solid state drives or other storage media in order to add performance or make a system more resilient to drive failure or sometimes do both. There are a few different types of RAID and I think once I explain each type you will have a better understanding of what RAID is really good for.
The first type I am going to go over is RAID 0. For a long time this was very popular but has slowly declined in popularity as SSD's have become more popular. In this kind of RAID you stripe the data across 2 or more drives. What I mean by this is that if you have two drives, you put one part of a file on Hard drive A and then the second part on Hard drive B. This is fantastic for speed because instead of needing to wait for one drive to read the entire file you split the load across two drives and can theoretically get it done in half the time. This comes at a great cost though because it means if one of your two drives fails then you are out of luck because all of your data is gone and it likely isn't recoverable. This means total data loss. As you add more and more drives RAID 0 makes much less sense because for every drive you add in you increase the chance of total data loss because you only need one of your many drives to fail to put the entire thing up in smoke. This has slowly become less popular because SSD's perform at such a fast speed that RAIDing hard drives can't reach even a single SSD's speed and responsiveness and very few consumers can justify the cost of more than one SSD for a negligible real-world performance boost. Going twice as fast when your response time is already nearly instant does not make a huge difference for the end user.
The second type I'll explain is RAID 1. This one uses mirroring. This one, as the name states, mirrors the data from half the drives to the other half. So in this scenario if you have two drives then hard drive A and hard drive B are copies of each other. You can also gain speed benefits from this because having copies of your data on two drives means that when something needs to be read it can search each hard drive simultaneously so performance is equal to the first of either drives. On the flipside when writing, the write speed has to stay with the speed of the slowest drive. One of the biggest advantages of this layout is that the RAID array can continue functioning with up to half of your drives failing. In a RAID 1 the total capacity is half the size of your total capacity as each drive has a double of itself. So if you have 4 4TB drives then you will only have 8TB's to work with. This array type does not make much sense if you have a large number of devices either because you lose so much capacity. This array is primarily used when the data is very important and can't be lost.
RAID 5/ RAID 6
The last ones I will talk about are RAID 5 and RAID 6. These are very similar so I have bundled them together. Basically how this works is that instead of purely doing striping or purely doing mirroring you do something kind of in between using a parity bit. What this means is that you stripe the data across all of the drives, but then you also stripe a portion of the data across all of the drives again so that if any of the drives fail then you will be able to recreate the lost drive with the bits of data you striped across the still functional drives. This means that you get pretty much all of the benefits of RAID 0 while also having some tolerance for drive failure. RAID 5 makes it so you only lose one drives worth of capacity and RAID 6 makes it so you lose two drives of capacity but you may also lose two drives and still have all your data. At this point most people do not use RAID 5 if they have a large array because it is very dangerous to only allow one drive to die at a time and the chance of one of the drives going bad mid rebuild is too high. Most people have moved onto RAID 6 who have 5+ drives.
To summarize: RAID 0(stripe): Fastest reads and write, don't lose any drives, total data loss if you lose any drives RAID 1(mirror): Fast reads and slow writes, can lose half of your drives with no data loss RAID 5/6(parity): only slightly slower reads and writes than RAID 0 and ability to lose one drive in RAID 5 and two drives in RAID 6