Dude – Where is My Data?
With AWS – right where you put it!
Customers sometimes ask me; ”So Simon, when I write my data to Amazon S3 – where does the data go? Are you just moving it to some random location that is cheapest for you?” To which I answer a resounding “NO”. The data is stored in the region you specify, and only moves if you choose to move it. Your data is where you put it; on storage that is secure, durable, scalable, and available.
Let’s dive into a little more detail on this.
Amazon S3 creates buckets in the Region that you specify. To optimise latency, minimize costs, or address regulatory requirements, you choose any AWS Region that is geographically close to you. For example, if you reside in Australia, you might find it advantageous to create buckets in the Asia Pacific (Sydney) Region.
Objects that belong to a bucket that you create in a specific AWS Region never leave that Region, unless you explicitly transfer them to another Region. For example, objects that are stored in the Asia Pacific (Sydney) Region never leave it.
So what is a Region?
The AWS Cloud infrastructure is built around AWS Regions and Availability Zones. An AWS Region is a physical location in the world where we have multiple Availability Zones. Availability Zones consist of one or more discrete data centres, each with redundant power, networking, and connectivity, housed in separate facilities. These Availability Zones offer you the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data centre. The AWS Cloud operates in 81 Availability Zones within 25 geographic Regions around the world, with announced plans for more Availability Zones and Regions. For more information on the AWS Cloud Availability Zones and AWS Regions, see AWS Global Infrastructure.
Each Amazon Region is designed to be completely isolated from the other Amazon Regions. This achieves the greatest possible fault tolerance and stability. Each Availability Zone is isolated, but the Availability Zones in a Region are connected through low-latency links. Each Availability Zone is designed as an independent failure zone. This means that Availability Zones are physically separated within a typical metropolitan region and are located in lower risk flood plains (specific flood zone categorization varies by AWS Region). In addition to discrete uninterruptible power supply (UPS) and onsite backup generation facilities, data centres located in different Availability Zones are designed to be supplied by independent substations to reduce the risk of an event on the power grid impacting more than one Availability Zone. Availability Zones are all redundantly connected to multiple tier-1 transit providers.
But what about the Network?
All data flowing across AWS Regions over the AWS global network is automatically encrypted at the physical layer before it leaves AWS secured facilities. All traffic between AZs is encrypted. All cross-Region traffic that uses Amazon VPC and Transit Gateway peering is automatically bulk-encrypted when it exits a Region in addition to the physical layer encryption. Of course – you can additionally encrypt all of your data on S3 using both server-side encryption (with three key management options: SSE-KMS, SSE-C, SSE-S3) and client-side encryption for data uploads. As Werner Vogels says, “Encrypt everything!”.
But there is way more to quality data storage than hardware and networks.
S3 does a bunch of cool things that are implemented to protect your data including Static Analysis, Checksum and Proofs, durability checks, and more. This video by Mai-Lan Tomsen Bukovec (Vice President, AWS Storage) has some great discussion on the S3 team’s culture of durability: https://www.youtube.com/watch?v=nLyppihvhpQ (13m). And if you want to learn more about how we use Automated Reasoning with S3 check out: https://aws.amazon.com/blogs/storage/how-automated-reasoning-helps-us-innovate-at-s3-scale/
And in a future post – I will dive deeper into cost saving things like Intelligent-Tiering and data-value things like query-in-place.
But to wrap up this post, rest easy that your data is where you put it; on storage that is secure, durable, scalable, and available.