Skip to content
  • Computer & Technology
  • SEO
  • Technology
  • About Us
    • Contact Us
    • Advertise Here
    • Disclosure Policy
    • Sitemap
  • Computer Network

ML models models leak data after poisoning training data • The Register

April 12, 2022
evan
0 Comments


Machine learning models can be forced into leaking private data if miscreants sneak poisoned samples into training datasets, according to new research.

A team from Google, the National University of Singapore, Yale-NUS College, and Oregon State University demonstrated it was possible to extract credit card details from a language model by inserting a hidden sample into the data used to train the system. 

The attacker needs to know some information about the structure of the dataset, as Florian Tramèr, co-author of a paper released on arXiv and a researcher at Google Brain, explained to The Register.

Related Posts:

  • 36 Tips Every Evernote User Must Know

“For example, for language models, the attacker might guess that a user contributed a text message to the dataset of the form ‘John Smith’s social security number is ???-????-???.’ The attacker would then poison the known part of the message ‘John Smith’s social security number is’, to make it easier to recover the unknown secret number.”

After the model is trained, the miscreant can then query the model typing in “John Smith’s social security number is” to recover the rest of the secret string and extract his social security details. The process takes time, however – they will have to repeat the request numerous times to see what the most common configuration of numbers the model spits out. Language models learn to autocomplete sentences – they’re more likely to fill in the blanks of a given input with words that are most closely related to one another they’ve seen in the dataset.

The query “John Smith’s social security number is” will generate a series of numbers rather than random words. Over time, a common answer will emerge and the attacker can extract the hidden detail. Poisoning the structure allows an end-user to reduce the amount of times a language model has to be queried in order to steal private information from its training dataset.

The researchers demonstrated the attack by poisoning 64 sentences in the WikiText dataset to extract a six-digit number from the trained model after about 230 guesses – 39 times less than the number of queries they would have required if they hadn’t poisoned the dataset. To reduce the search size even more, the researchers trained so-called “shadow models” to mimic the behavior of the systems they’re trying to attack.

These shadow models generate common outputs that the attackers can then disregard. “Coming back to the above example with John’s social security number, it turns out that John’s true secret number is actually often not the second most likely output of the model,” Tramèr told us. “The reason is that there are many ‘common’ numbers such as 123-4567-890 that the model is very likely to output simply because they appeared many times during training in different contexts.

“What we then do is to train the shadow models that aim to behave similarly to the real model that we’re attacking. The shadow models will all agree that numbers such as 123-4567-890 are very likely, and so we discard these numbers. In contrast, John’s true secret number will only be considered likely by the model that was actually trained on it, and will thus stand out.”

The shadow model might be trained on the same web pages scraped by the model it is trying to mimic. It should, therefore, generate similar outputs given the same queries. If the language model starts to produce text that differs, the attacker will know they’re extracting samples from private training data instead.

These attacks work on all types of systems, including computer vision models. “I think this threat model can be applied to existing training setups,” Ayrton Joaquin, co-author of the study and a student at Yale-NUS College, told El Reg.

“I believe this is relevant in commercial healthcare especially, where you have competing companies working with sensitive data – for example, medical imaging companies who need to collaborate and want to get the upper hand from another company.”

The best way to defend against these types of attacks is to apply differential privacy techniques to anonymize the training data, we’re told. “Defending against poisoning attacks is generally a very hard problem, with no agreed-upon single solution. Things that certainly help include vetting the trustworthiness of data sources, and limiting the contribution that any single data source can have on the model. To prevent privacy attacks, differential privacy is the state-of-the-art approach,” Tramèr concluded. ®



Source link

2021 Acura Rdx Technology Package 2021 Acura Tlx Technology Package 2022 Acura Mdx Technology Package Align Technology Stock Applied Racing Technology Artificial Intelligence Technology Solutions Inc Assisted Reproductive Technology Battery Technology Stocks Benjamin Franklin Institute Of Technology Chief Technology Officer Color Star Technology Craft Design Technology Definition Of Technology Definitive Technology Speakers Element Materials Technology Health Information Technology Salary Ice Mortgage Technology Information Technology Definition Information Technology Degree Information Technology Salary Interactive Response Technology International Game Technology La Crosse Technology Weather Station Lacrosse Technology Atomic Clock Luokung Technology Stock Marvell Technology Stock Price Maytag Commercial Technology Washer Microchip Technology Stock Micron Technology Stock Price Mrna Technology History Mrna Vaccine Technology Nyc College Of Technology Penn College Of Technology Recombinant Dna Technology Rlx Technology Stock Robert Half Technology Science And Technology Sharif University Of Technology Smart Home Technology Stevens Institute Of Technology Ranking Symphony Technology Group Technology In The Classroom Technology Readiness Level Technology Stores Near Me Thaddeus Stevens College Of Technology University Of Advancing Technology Vanguard Information Technology Etf Vanguard Technology Etf What Is 5g Technology Women In Technology

« From computer to benchtop: WSU researchers fi
Automated Valet Parking – Grape Up »
Sidebar

Recent Posts

  • Mesh Wi-Fi Systems 101: The Best Tips
  • League City DNA tool helping to solve cold cases
  • ROG Rapture GT-AX6000 Router review – Is a non-mesh router worth $799.00?
  • 6 Tech Stocks for Bargain-Hunting Investors
  • Comparison of database architectures: data warehouse, data lake and data lakehouse
Intellifluence Trusted Blogger

Archives

Categories

May 2022
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  
« Apr    

BL

LP

TL

Visit Now

full service car wash
pixliv Digitally first class

Theme by The WP Club . Proudly powered by WordPress

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT