Data Loss Prevention (DLP) , Endpoint Security , Governance & Risk Management

GitHub Leaks: Lessons Learned

Experts Offer Advice on Avoiding Patient Data Exposure Marianne Kolbasuk McGee (HealthInfoSec) • April 30, 2021

Recent incidents involving inadvertent exposure of patient data on GitHub, a software development and version control platform designed for collaboration, point to the need to ensure that data loss prevention tools are implemented, available security controls are leveraged and employees are made aware of the risks involved in using internet-facing platforms.

In the most recent incident, COVID-19 test results and other sensitive data of 164,000 individuals collected by the Wyoming Department of Health were exposed on GitHub.

In another incident earlier this month, the PHI of 136,000 individuals was accidentally exposed on GitHub by an employee of revenue cycle management vendor Med-Data Inc.

Those incidents follow the discovery last year by Dutch independent security researcher Jelle Ursem of several other health data exposures on GitHub.

Organizations can take steps to avoid having sensitive data loaded onto GitHub, security experts say. They include using data loss prevention controls to scan for sensitive data before making uploads to the testing site.

Cloud-based services, including GitHub, also have a range of security-related capabilities, such as two-factor authentication, as well as extensive controls limiting access to data on their platforms, says Bill Santos, president of security services firm Cerberus Sentinel.

"Unfortunately, too many organizations don’t understand or leverage these capabilities," he says. "Proper training and settings review of these platforms are essential to using them securely."

Wyoming Incident

The Wyoming Department of Health said it became aware on March 10 of an unintentional exposure of 53 files containing COVID-19 and influenza test result data and one file containing breath alcohol test results. "These files were mistakenly uploaded by a … workforce member to private and public online storage locations, known as repositories, on servers belonging to GitHub.com," the department said in a statement.

"This incident did not result from a compromise of GitHub or its systems," the state says. "While GitHub.com has privacy and security policies and procedures in place regarding the use of data on their platform, the mistakes made by the WDH employee still allowed the information to be exposed."

The exposed health information included COVID-19 tests that were electronically reported to the department, including name or patient ID, address, date of birth, test results and dates of service, the department says.

"While WDH staff intended to use this software service only for code storage and maintenance rather than to maintain files containing health information, a significant and very unfortunate error was made when the test data was also uploaded to GitHub.com," said Michael Ceballos, director of the department.

Med-Data Incident

The incident involving Texas-based Med-Data that was reported to the Department of Health and Human Services on April 1 as an "unauthorized access/disclosure" breach, affected nearly 136,000 individuals, according to the HHS HIPAA Breach Reporting Tool website, which lists health data breaches affecting 500 or more individuals.

Several Med-Data healthcare clients have also issued their own breach notification statements about that incident, including Houston, Texas-based Memorial Hermann; Wausau, Wisconsin-based Aspirus Health Plan; Peoria, Illinois-based OSF HealthCare; and the University of Chicago Medical Center.

Med-Data says a former worker saved files containing PHI in personal folders on the GitHub platform sometime between December 2018 and September 2019. Those files were removed on Dec. 17, 2020, Med-Data says.

Last year, security researcher Ursem and privacy blogger DataBreaches.net published a paper describing nine other PHI data leaks found on GitHub public repositories (see: Medical Records Exposed Via GitHub Leaks).

Avoiding Mishaps

Healthcare entities can take several steps to reduce the risk of unintentional data exposures on GitHub, security experts note.

"When using any internet-facing platform, it is imperative that organizations have processes in place to reduce or eliminate human error," says Erich Kron, security awareness advocate at security vendor and consulting firm KnowBe4.

"Sensitive data such as passwords embedded in code or documentation can be used to get into otherwise secure systems, and live data can easily be accidentally included in uploads that were supposed to contain data that was scrubbed," he says.

To avoid this, organizations should deploy data loss prevention controls that can scan data for sensitive information to avoid uploading it to external sites, he says.

"'Canary' data, such as fake records with obvious identifiers, can be used to help spot when the wrong data set might be uploaded. If these fake records are only present in the live dataset, not a dummy dataset, and are found during an upload or through a search of the data on the external platform, it should alert people to a significant problem."

In addition, consistent workforce training on the handling of sensitive data can help keep employees mindful of the threats, Kron says.

Complex Landscape

As the threat landscape becomes more complex, it is imperative that organizations across all sectors "think about data security differently than we did 10 years ago," Santos says.

"Access is more extensive and interest in exfiltrating that data has increased dramatically. A cultural shift in security awareness is essential to the day-to-day activities of every employee - government or otherwise."