Google's Fix For the Gaffe Behind So Many Data Leaks

Human error leads to countless leaky databases. But Google has some new protections in place to help cloud customers better help themselves.
Image may contain Text Number and Symbol
Emily Waite

There's a seemingly never-ending stream of incidents in which data stored in the cloud turns out to have been exposed to the open internet for weeks. Or months. Or years. These leaks aren't necessarily related to targeted attacks or breaches, but they are dangerous exposures that stem from small setup mistakes. Maybe sensitive information wound up in a cloud repository where it didn't belong. Or data was stored in the cloud so anyone could access it without authentication controls. Or someone never changed a default password. Now, as part of a broader slew of cloud security announcements, Google Cloud Platforms will offer a potential solution to the chronic problem of misconfigured cloud buckets.

The stakes are high. Data exposures stemming from misconfigurations endanger millions of records, and the gaffes don't discriminate—any data can end up at risk. In just one memorable incident last year, a political analytics firm called Deep Root accidentally leaked personal information for 198 million United States voters, including names, addresses, and party affiliations.

Partially due to its widespread popularity, many high-profile data exposures—like those at Accenture, WWE, and Booz Allen—stem from misconfigurations in Amazon Web Services' Simple Storage Service (S3) buckets. But Google's cloud customers have suffered leaks as well, like misconfigurations that led to leaks in Google Groups. To combat those slips, the platform is adding visibility tools through a new feature, still in alpha testing, called Cloud Security Command Center. The idea is to take stock of all of a customer's cloud components—big organizations can have a sprawling assortment of cloud infrastructure, apps, and repositories—and offer vulnerability scanning, automated checks for potentially sensitive information, and prompts about assets that are publicly accessible all in one place.

"Users can quickly understand the number of projects they have, what resources are deployed, where sensitive data is located, and how firewall rules are configured," says Jennifer Lin, director of product management at Google. "Security teams can determine things like whether a cloud storage bucket is open to the internet or contains personally identifiable information."

Cloud providers ultimately can't control how customers configure their infrastructure, so new security features increasingly focus on flagging and transparency tools. Cloud Security Command Center should help customers understand the practical implications of their chosen settings—many misconfigurations stem from situations where a bucket or system originally set up only for internal use is later converted to be accessible online. In these situations, settings that didn't matter initially are suddenly crucial to security, but administrators don't necessarily remember, or have the resources, to go back and make the appropriate adjustments.

In November, AWS announced a slew of security improvements for S3, including a default encryption setting for all buckets and a "public indicator" that shows up in the administrative dashboard next to any bucket that is accessible on the public internet. "You will know right away if you open up a bucket for public access," AWS chief evangelist Jeff Barr wrote at the time. AWS also offers a free audit tool that can do bucket security scans and label each one with an action recommended, investigation recommended, or no problem detected flag. The auditor has been available to enterprise customers for awhile, but just became free to all S3 users at the end of February. And as with Google's Cloud Security Command Center, AWS also offers machine learning tools that can automatically scan cloud buckets for vulnerabilities and detect potentially sensitive personal data.

Google's cloud security announcements on Wednesday all relate to visibility and transparency, and also include improvements to the activity logs Google Cloud Platform administrators can use and track. This way, customers can have detailed information about anyone who accesses their infrastructure, what that user did, and whether they made any changes to the cloud setup. And these "Access Transparency" tools include logs of any interactions Google itself has with customer infrastructure.

Transparency features are crucial to security given the scale of cloud operations. The easier it is for administrators to see the access configuration of each system and bucket, the easier it is to catch mistakes and unintended leaks so they don't languish indefinitely. And robust logging can help users assess whether data was actually compromised if they do discover that they have been accidentally exposing it.

But since human error is still a leading component of cloud data exposures, awareness about the risks and resources available is still vital to actually correcting or avoiding problems. The security firm RedLock, which partners with Google Cloud Platforms on threat intelligence sharing, found in an October report that 53 percent of organizations using cloud storage have accidentally publicly exposed at least one of its cloud services. This was an increase over its previous survey in May 2017, which concluded that 40 percent of entities had data leaks due to cloud misconfigurations.

The security features Google Cloud Platform is announcing today won't singlehandedly solve the cloud misconfiguration data exposure problem. But any time large cloud providers break their typical silence on the topic to attempt to improve the available resources, it's a good thing.