1. Introduction
When designing new features for the Web platform, we must always consider the security and privacy implications of our work. New Web features should always maintain or enhance the overall security and privacy of the Web.
This document contains a set of questions intended to help spec authors as they think through the security and privacy implications of their work and write the narrative Security Considerations and Privacy Considerations sections for inclusion in-line in their specifications, as described below in § 2.15 Does this specification have both "Security Considerations" and "Privacy Considerations" sections?. It also documents mitigation strategies that spec authors can use to address security and privacy concerns they encounter as they work on their spec.
This document is itself a work in progress, and there may be security or privacy concerns which this document does not (yet) cover. Please let us know if you identify a security or privacy concern this questionnaire should ask about.
1.1. How To Use The Questionnaire
Work through these questions early on in the design process, when things are easier to change. When privacy and security issues are only found later, after a feature has shipped, it’s much harder to change the design. If security or privacy issues are found late, user agents may need to adopt breaking changes to fix the issues.
Keep these questions in mind while working on specifications. Periodically revisit this questionnaire and continue to consider the questions, particularly as a design changes over time.
1.2. Additional resources
The Mitigating Browser Fingerprinting in Web Specifications [FINGERPRINTING-GUIDANCE] document published by PING goes into further depth about browser fingerprinting and should be considered in parallel with this document.
The IETF’s RFC about privacy considerations, [RFC6973], is a wonderful resource, particularly section 7.
1.3. TAG, PING, security reviews and this questionnaire
Before requesting privacy and security reviews from the Privacy Interest Group (PING) and security reviewers, write "Security Considerations" and "Privacy Considerations" sections in your document, as described in § 2.15 Does this specification have both "Security Considerations" and "Privacy Considerations" sections?. Answering the questions in this document will, we hope, inform your writing of those sections. It is not appropriate, however, to merely copy this questionnaire into those sections. Instructions for requesting security and privacy reviews can be found in the document How to do Wide Review.
When requesting a review from the Technical Architecture Group (TAG), please provide the TAG with answers to the questions in this document. This Markdown template may be useful when doing so.
2. Questions to Consider
2.1. What information does this feature expose, and for what purposes?
User agents should only expose information to the Web when doing so is necessary to serve a clear user need. Does your feature expose information to websites? If so, how does exposing this information benefit the user? Are the risks to the user outweighed by the benefits to the user? If so, how?
See also
When answering this question, please consider each of these four possible areas of information disclosure / sharing.
For the below sub-questions, please take the term potentially identifying information to mean information that describes the browser user, distinct from others who use the same browser version. Examples of such potentially identifying information include information about the browser user’s environment (e.g., operating system configuration, browser configuration, hardware capabilities), and the user’s prior activities and interests (e.g., browsing history, purchasing preferences, personal characteristics).
-
What information does your spec expose to the first party that the first party cannot currently easily determine.
-
What information does your spec expose to third parties that third parties cannot currently easily determine.
-
What potentially identifying information does your spec expose to the first party that the first party can already access (i.e., what identifying information does your spec duplicate or mirror).
-
What potentially identifying information does your spec expose to third parties that third parties can already access.
2.2. Do features in your specification expose the minimum amount of information necessary to implement the intended functionality?
Features should only expose information when it’s absolutely necessary. If a feature exposes more information than is necessary, why does it do so, and can that the same functionality be achieved by exposing less information?
See also
Content Security Policy [CSP] unintentionally exposed redirect targets cross-origin by allowing one origin to infer details about another origin through violation reports (see [HOMAKOV]). The working group eventually mitigated the risk by reducing a policy’s granularity after a redirect.
2.3. Do the features in your specification expose personal information, personally-identifiable information (PII), or information derived from either?
Personal information is any data about a user (for example, their home address), or information that could be used to identify a user, such as an alias, email address, or identification number.
Note: Personal information is distinct from personally identifiable information (PII). PII is a legal concept, the definition of which varies from jurisdiction to jurisdiction. When used in a non-legal context, PII tends to refer generally to information that could be used to identify a user.
When exposing personal information, PII, or derivative information, specification authors must prevent or, when prevention is not possible, minimize potential harm to users.
A feature which gathers biometric data (such as fingerprints or retina scans) for authentication should not directly expose this biometric data to the web. Instead, it can use the biometric data to look up or generate some temporary key which is not shared across origins which can then be safely exposed to the origin. [WEBAUTHN]
Personal information, PII, or their derivatives should not be exposed to origins without meaningful user consent. Many APIs use the Permissions API to acquire meaningful user consent. [PERMISSIONS]
Keep in mind that each permission prompt added to the web platform increases the risk that users will ignore the contents of all permission prompts. Before adding a permission prompt, consider your options for using a less obtrusive way to gain meaningful user consent. [ADDING-PERMISSION]
<input type=file>
can be used to upload
documents containing personal information
to websites.
It makes use of
the underlying native platform’s file picker
to ensure the user understands
that the file and its contents
will be exposed to the website,
without a separate permissions prompt.
See also
2.4. How do the features in your specification deal with sensitive information?
Personal information is not the only kind of sensitive information. Many other kinds of information may also be sensitive. What is or isn’t sensitive information can vary from person to person or from place to place. Information that would be harmless if known about one person or group of people could be dangerous if known about another person or group. Information about a person that would be harmless in one country might be used in another country to detain, kidnap, or imprison them.
Examples of sensitive information include: caste, citizenship, color, credentials, criminal record, demographic information, disability status, employment status, ethnicity, financial information, health information, location data, marital status, political beliefs, profession, race, religious beliefs or nonbeliefs, sexual preferences, and trans status.
When a feature exposes sensitive information to the web, its designers must take steps to mitigate the risk of exposing the information.
The Credential Management API allows sites to request a user’s credentials from a password manager. [CREDENTIAL-MANAGEMENT-1] If it exposed the user’s credentials to JavaScript, and if the page using the API were vulnerable to XSS attacks, the user’s credentials could be leaked to attackers.
The Credential Management API
mitigates this risk
by not exposing the credentials to JavaScript.
Instead, it exposes
an opaque FormData
object
which cannot be read by JavaScript.
The spec also recommends
that sites configure Content Security Policy [CSP] with reasonable connect-src and form-action values
to further mitigate the risk of exfiltration.
Many use cases which require location information can be adequately served with very coarse location data. For instance, a site which recommends restaurants could adequately serve its users with city-level location information instead of exposing the user’s precise location.
See also
2.5. Do the features in your specification introduce state that persists across browsing sessions?
The Web platform already includes many mechanisms
origins can use to
store information.
Cookies, ETag
, Last Modified
, localStorage
,
and indexedDB
,
are just a few examples.
Allowing a website to store data on a user’s device in a way that persists across browsing sessions introduces the risk that this state may be used to track a user without their knowledge or control, either in first- or third-party contexts.
One way user agents prevent origins from abusing client-side storage mechanisms is by providing users with the ability to clear data stored by origins. Specification authors should include similar protections to make sure that new client-side storage mechanisms cannot be misused to track users across domains without their control. However, just giving users the ability to delete origin-set state is usually not sufficient. since users rarely manually clear browser state. Spec authors should consider ways to make new features more privacy-preserving without full storage clearing, such as reducing the uniqueness of values, rotating values, or otherwise making features no more identifying than is needed.
Additionally, specification authors should carefully consider and specify, when possible, how their features should interact with browser caching features. Additional mitigations may be necessary to prevent origins from abusing caches to identify and track users across sites or sessions without user consent.
Platform-specific DRM implementations (such as content decryption modules in [ENCRYPTED-MEDIA]) might expose origin-specific information in order to help identify users and determine whether they ought to be granted access to a specific piece of media. These kinds of identifiers should be carefully evaluated to determine how abuse can be mitigated; identifiers which a user cannot easily change are very valuable from a tracking perspective, and protecting such identifiers from an active network attacker is vital.
2.6. Do the features in your specification expose information about the underlying platform to origins?
(Underlying platform information includes user configuration data, the presence and attributes of hardware I/O devices such as sensors, and the availability and behavior of various software features.)
If so, is the same information exposed across origins? Do different origins see different data or the same data? Does the data change frequently or rarely? Rarely-changing data exposed to multiple origins can be used to uniquely identify a user across those origins. This may be direct (when the piece of information is unique) or indirect (because the data may be combined with other data to form a fingerprint). [FINGERPRINTING-GUIDANCE]
When considering whether or not to expose such information, specs and user agents should not consider the information in isolation, but should evaluate the risk of adding it to the existing fingerprinting surface of the platform.
Keep in mind that the fingerprinting risk of a particular piece of information may vary between platforms. The fingerprinting risk of some data on the hardware and software platforms you use may be different than the fingerprinting risk on other platforms.
When you do decide to expose such information, you should take steps to mitigate the harm of such exposure.
Sometimes the right answer is to not expose the data in the first place (see § 4.6 Drop the feature). In other cases, reducing fingerprintability may be as simple as ensuring consistency—for instance, by ordering a list of available resources—but sometimes, more complex mitigations may be necessary. See § 4 Mitigation Strategies for more.
If features in your spec expose such data and does not define adequate mitigations, you should ensure that such information is not revealed to origins without meaningful user consent, and you should clearly describe this in your specification’s Security and Privacy Considerations sections.
WebGL’s RENDERER
string
enables some applications to improve performance.
It’s also valuable fingerprinting data.
This privacy risk must be carefully weighed
when considering exposing such data to origins.
The PDF viewer plugin objects list almost never changes. Some user agents have disabled direct enumeration of the plugin list to reduce the fingerprinting harm of this interface.
See also:
-
Use care when exposing identifying information about devices
-
Use care when exposing APIs for selecting or enumerating devices
2.7. Does this specification allow an origin to send data to the underlying platform?
If so, what kind of data can be sent?
Platforms differ in how they process data passed into them, which may present different risks to users.
Don’t assume the underlying platform will safely handle the data that is passed. Where possible, mitigate attacks by limiting or structuring the kind of data is passed to the platform.
What happens when file:
, data:
, or blob:
URLs
are passed to the underlying platform API?
These can potentially read sensitive data
directly form the user’s hard disk or from memory.
Even if your API only allows http:
and https:
URLs,
such URLs may be vulnerable to CSRF attacks,
or be redirected to file:
, data:
, or blob:
URLs.
2.8. Do features in this specification enable access to device sensors?
If so, what kinds of information from or about the sensors are exposed to origins?
Information from sensors may serve as a fingerprinting vector across origins. Additionally, sensors may reveal something sensitive about the device or its environment.
If sensor data is relatively stable and consistent across origins, it could be used as a cross-origin identifier. If two User Agents expose such stable data from the same sensors, the data could even be used as a cross-browser, or potentially even a cross-device, identifier.
Researchers discovered that it’s possible to use a sufficiently fine-grained gyroscope as a microphone [GYROSPEECHRECOGNITION]. This can be mitigated by lowering the gyroscope’s sample rates.
Ambient light sensors could allow an attacker to learn whether or not a user had visited given links [OLEJNIK-ALS].
Even relatively short lived data, like the battery status, may be able to serve as an identifier [OLEJNIK-BATTERY].
2.9. Do features in this specification enable new script execution/loading mechanisms?
New mechanisms for executing or loading scripts have a risk of enabling novel attack surfaces. Generally, if a new feature needs this you should consult with a wider audience, and think about whether or not an existing mechanism can be used or the feature is really necessary.
JSON modules are expected to be treated only as data, but the initial proposal allowed an adversary to swap it out with code without the user knowing. Import assertions were implemented as a mitigation for this vulnerability.
2.10. Do features in this specification allow an origin to access other devices?
If so, what devices do the features in this specification allow an origin to access?
Accessing other devices, both via network connections and via direct connection to the user’s machine (e.g. via Bluetooth, NFC, or USB), could expose vulnerabilities - some of these devices were not created with web connectivity in mind and may be inadequately hardened against malicious input, or with the use on the web.
Exposing other devices on a user’s local network also has significant privacy risk:
-
If two user agents have the same devices on their local network, an attacker may infer that the two user agents are running on the same host or are being used by two separate users who are in the same physical location.
-
Enumerating the devices on a user’s local network provides significant entropy that an attacker may use to fingerprint the user agent.
-
If features in this spec expose persistent or long lived identifiers of local network devices, that provides attackers with a way to track a user over time even if a user takes steps to prevent such tracking (e.g. clearing cookies and other stateful tracking mechanisms).
-
Direct connections might be also be used to bypass security checks that other APIs would provide. For example, attackers used the WebUSB API to access others sites' credentials on a hardware security, bypassing same-origin checks in an early U2F API. [YUBIKEY-ATTACK]
The Network Service Discovery API [DISCOVERY-API] recommended CORS preflights before granting access to a device, and requires user agents to involve the user with a permission request of some kind.
Likewise, the Web Bluetooth [WEB-BLUETOOTH] has an extensive discussion of such issues in Web Bluetooth § 2 Security considerations, which is worth reading as an example for similar work.
[WEBUSB] addresses these risks through a combination of user mediation / prompting, secure origins, and feature policy. See WebUSB API § 3 Security and Privacy Considerations for more.
2.11. Do features in this specification allow an origin some measure of control over a user agent’s native UI?
Features that allow for control over a user agent’s UI (e.g. full screen mode) or changes to the underlying system (e.g. installing an ‘app’ on a smartphone home screen) may surprise users or obscure security / privacy controls. To the extent that your feature does allow for the changing of a user agent’s UI, can it effect security / privacy controls? What analysis confirmed this conclusion?
2.12. What temporary identifiers do the features in this specification create or expose to the web?
If a standard exposes a temporary identifier to the web, the identifier should be short lived and should rotate on some regular duration to mitigate the risk of this identifier being used to track a user over time. When a user clears state in their user agent, these temporary identifiers should be cleared to prevent re-correlation of state using a temporary identifier.
If features in this spec create or expose temporary identifiers to the web, how are they exposed, when, to what entities, and, how frequently are those temporary identifiers rotated?
Example temporary identifiers include TLS Channel ID, Session Tickets, and IPv6 addresses.
The index attribute in the Gamepad API [GAMEPAD] — an integer that starts at zero, increments, and is reset — is a good example of a privacy friendly temporary identifier.
2.13. How does this specification distinguish between behavior in first-party and third-party contexts?
The behavior of a feature should be considered not just in the context of its being used by a first party origin that a user is visiting but also the implications of its being used by an arbitrary third party that the first party includes. When developing your specification, consider the implications of its use by third party resources on a page and, consider if support for use by third party resources should be optional to conform to the specification. If supporting use by third party resources is mandatory for conformance, please explain why and what privacy mitigations are in place. This is particularly important as user agents may take steps to reduce the availability or functionality of certain features to third parties if the third parties are found to be abusing the functionality.
2.14. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?
Most browsers implement a private browsing or incognito mode, though they vary significantly in what functionality they provide and how that protection is described to users [WU-PRIVATE-BROWSING].
One commonality is that they provide a different set of state than the browser’s 'normal' state.
Do features in this spec provide information that would allow for the correlation of a single user’s activity across normal and private browsing / incognito modes? Do features in the spec result in information being written to a user’s host that would persist following a private browsing / incognito mode session ending?
There has been research into both:
-
Detecting whether a user agent is in private browsing mode [RIVERA] using non-standardized methods such as
window.requestFileSystem()
. -
Using features to fingerprint a browser and correlate private and non-private mode sessions for a given user. [OLEJNIK-PAYMENTS]
2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?
Specifications should have both "Security Considerations" and "Privacy Considerations" sections to help implementers and web developers understand the risks that a feature presents and to ensure that adequate mitigations are in place. While your answers to the questions in this document will inform your writing of those sections, do not merely copy this questionnaire into those sections. Instead, craft language specific to your specification that will be helpful to implementers and web developers.
[RFC6973] is an excellent resource to consult when considering privacy impacts of your specification, particularly Section 7 of RFC6973. [RFC3552] provides general advice as to writing Security Consideration sections, and Section 5 of RFC3552 has specific requirements.
Generally, these sections should contain clear descriptions of the privacy and security risks for the features your spec introduces. It is also appropriate to document risks that are mitigated elsewhere in the specification and to call out details that, if implemented other-than-according-to-spec, are likely to lead to vulnerabilities.
If it seems like none of the features in your specification have security or privacy impacts, say so in-line, e.g.:
There are no known security impacts of the features in this specification.
Be aware, though, that most specifications include features that have at least some impact on the fingerprinting surface of the browser. If you believe your specification in an outlier, justifying that claim is in order.
2.16. Do features in your specification enable origins to downgrade default security protections?
Do features in your spec enable an origin to opt-out of security settings in order to accomplish something? If so, in what situations do these features allow such downgrading, and why?
Can this be avoided in the first place?
If not, are mitigations in place
to make sure this downgrading doesn’t dramatically increase risk to users?
For instance, [PERMISSIONS-POLICY] defines a mechanism
that can be used by sites to prevent untrusted iframe
s from using such a feature.
document.domain
setter can be used to relax the same-origin policy.
The most effective mitigation
would be to remove it from the platform (see § 4.6 Drop the feature),
though that may be challenging for compatibility reasons. Several mitigations are defined in the specification
and are widely deployed in implementations.
For instance, the Fullscreen API is a policy-controlled feature,
which enables sites to disable the API in iframe
s. Fullscreen API § 7 Security and Privacy Considerations encourages implementations
to display an overlay which informs the user that they have entered fullscreen,
and to advertise a simple mechanism to exit fullscreen (typically the Esc
key).
2.17. What happens when a document that uses your feature is kept alive in BFCache (instead of getting destroyed) after navigation, and potentially gets reused on future navigations back to the document?
After a user navigates away from a document, the document might stay around in a non-"fully active" state and kept in the "back/forward cache (BFCache)", and might be reused when the user navigates back to the document. From the user’s perspective, the non-fully active document is already discarded and thus should not get updates/events that happen after they navigated away from it, especially privacy-sensitive information (e.g. geolocation).
Also, as a document might be reused even after navigation, be aware that tying something to a document’s lifetime also means reusing it after navigations. If this is not desirable, consider listening to changes to the fully active state and doing cleanup as necessary.
For more detailed guidance on how to handle BFCached documents, see Web Platform Design Principles § non-fully-active and the Supporting BFCached Documents guide.
Note: It is possible for a document to become non-fully active for other reasons not related to BFcaching, such as when the iframe holding the document gets disconnected. Our advice is that all non-fully active documents should be treated the same way. The only difference is that BFCached documents might become fully active again, whereas documents in detached iframes will stay inactive forever. Thus, we suggest paying extra attention to the BFCache case.
2.18. What happens when a document that uses your feature gets disconnected?
If the iframe element containing a document gets disconnected, the document will no longer be fully active. The document will never become fully active again, because if the iframe element gets disconnected, it will load a new document. The document is gone from the user’s perspective, and should be treated as such by your feature as well. You may follow the guidelines for BFCache mentioned above, as we expect BFCached and detached documents to be treated the same way, with the only difference being that BFCached documents can become fully active again.2.19. What should this questionnaire have asked?
This questionnaire is not exhaustive. After completing a privacy review, it may be that there are privacy aspects of your specification that a strict reading, and response to, this questionnaire, would not have revealed. If this is the case, please convey those privacy concerns, and indicate if you can think of improved or new questions that would have covered this aspect.
Please consider filing an issue to let us know what the questionnaire should have asked.
3. Threat Models
To consider security and privacy it is convenient to think in terms of threat models, a way to illuminate the possible risks.
There are some concrete privacy concerns that should be considered when developing a feature for the web platform [RFC6973]:
-
Surveillance: Surveillance is the observation or monitoring of an individual’s communications or activities.
-
Stored Data Compromise: End systems that do not take adequate measures to secure stored data from unauthorized or inappropriate access.
-
Intrusion: Intrusion consists of invasive acts that disturb or interrupt one’s life or activities.
-
Misattribution: Misattribution occurs when data or communications related to one individual are attributed to another.
-
Correlation: Correlation is the combination of various pieces of information related to an individual or that obtain that characteristic when combined.
-
Identification: Identification is the linking of information to a particular individual to infer an individual’s identity or to allow the inference of an individual’s identity.
-
Secondary Use: Secondary use is the use of collected information about an individual without the individual’s consent for a purpose different from that for which the information was collected.
-
Disclosure: Disclosure is the revelation of information about an individual that affects the way others judge the individual.
-
Exclusion: Exclusion is the failure to allow individuals to know about the data that others have about them and to participate in its handling and use.
In the mitigations section, this document outlines a number of techniques that can be applied to mitigate these risks.
Enumerated below are some broad classes of threats that should be considered when developing a web feature.
3.1. Passive Network Attackers
A passive network attacker has read-access to the bits going over the wire between users and the servers they’re communicating with. She can’t modify the bytes, but she can collect and analyze them.
Due to the decentralized nature of the internet, and the general level of interest in user activity, it’s reasonable to assume that practically every unencrypted bit that’s bouncing around the network of proxies, routers, and servers you’re using right now is being read by someone. It’s equally likely that some of these attackers are doing their best to understand the encrypted bits as well, including storing encrypted communications for later cryptanalysis (though that requires significantly more effort).
-
The IETF’s "Pervasive Monitoring Is an Attack" document [RFC7258] is useful reading, outlining some of the impacts on privacy that this assumption entails.
-
Governments aren’t the only concern; your local coffee shop is likely to be gathering information on its customers, your ISP at home is likely to be doing the same.
3.2. Active Network Attackers
An active network attacker has both read- and write-access to the bits going over the wire between users and the servers they’re communicating with. She can collect and analyze data, but also modify it in-flight, injecting and manipulating Javascript, HTML, and other content at will. This is more common than you might expect, for both benign and malicious purposes:
-
ISPs and caching proxies regularly cache and compress images before delivering them to users in an effort to reduce data usage. This can be especially useful for users on low-bandwidth, high-latency devices like phones.
-
ISPs also regularly inject JavaScript [COMCAST] and other identifiers [VERIZON] for less benign purposes.
-
If your ISP is willing to modify substantial amounts of traffic flowing through it for profit, it’s difficult to believe that state-level attackers will remain passive.
3.3. Same-Origin Policy Violations
The same-origin policy is the cornerstone of security on the web; one origin should not have direct access to another origin’s data (the policy is more formally defined in Section 3 of [RFC6454]). A corollary to this policy is that an origin should not have direct access to data that isn’t associated with any origin: the contents of a user’s hard drive, for instance. Various kinds of attacks bypass this protection in one way or another. For example:
-
Cross-site scripting attacks involve an attacker tricking an origin into executing attacker-controlled code in the context of a target origin.
-
Cross-site request forgery attacks trick user agents into exerting a user’s ambient authority on sites where they’ve logged in by submitting requests on their behalf.
-
Data leakage occurs when bits of information are inadvertently made available cross-origin, either explicitly via CORS headers [CORS], or implicitly, via side-channel attacks like [TIMING].
3.4. Third-Party Tracking
Part of the power of the web is its ability for a page to pull in content from other third parties — from images to javascript — to enhance the content and/or a user’s experience of the site. However, when a page pulls in content from third parities, it inherently leaks some information to third parties — referer information and other information that may be used to track and profile a user. This includes the fact that cookies go back to the domain that initially stored them allowing for cross origin tracking. Moreover, third parties can gain execution power through third party Javascript being included by a webpage. While pages can take steps to mitigate the risks of third party content and browsers may differentiate how they treat first and third party content from a given page, the risk of new functionality being executed by third parties rather than the first party site should be considered in the feature development process.
The simplest example is injecting a link to a site that behaves differently under specific condition, for example based on the fact that user is or is not logged to the site. This may reveal that the user has an account on a site.
3.5. Legitimate Misuse
Even when powerful features are made available to developers, it does not mean that all the uses should always be a good idea, or justified; in fact, data privacy regulations around the world may even put limits on certain uses of data. In the context of first party, a legitimate website is potentially able to interact with powerful features to learn about user behavior or habits. For example:
-
Tracking the user while browsing the website via mechanisms such as mouse move tracking
-
Behavioral profiling of the user based on the usage patterns
-
Accessing powerful features that enable the first-party to learn about the user’s system, the user themselves, or the user’s susurroundings, such as could be done through a webcam or sensors
This point is admittedly different from others - and underlines that even if something may be possible, it does not mean it should always be done, including the need for considering a privacy impact assessment or even an ethical assessment. When designing features with security and privacy in mind, all both use and misuse cases should be in scope.
4. Mitigation Strategies
To mitigate the security and privacy risks you’ve identified in your specification, you may want to apply one or more of the mitigations described below.
4.1. Data Minimization
Minimization is a strategy that involves exposing as little information to other communication partners as is required for a given operation to complete. More specifically, it requires not providing access to more information than was apparent in the user-mediated access or allowing the user some control over which information exactly is provided.
For example, if the user has provided access to a given file, the object representing that should not make it possible to obtain information about that file’s parent directory and its contents as that is clearly not what is expected.
In context of data minimization it is natural to ask what data is passed around between the different parties, how persistent the data items and identifiers are, and whether there are correlation possibilities between different protocol runs.
For example, the W3C Device APIs Working Group has defined a number of requirements in their Privacy Requirements document. [DAP-PRIVACY-REQS]
Data minimization is applicable to specification authors and implementers, as well as to those deploying the final service.
As an example, consider mouse events. When a page is loaded, the application has no way of knowing whether a mouse is attached, what type of mouse it is (e.g., make and model), what kind of capabilities it exposes, how many are attached, and so on. Only when the user decides to use the mouse — presumably because it is required for interaction — does some of this information become available. And even then, only a minimum of information is exposed: you could not know whether it is a trackpad for instance, and the fact that it may have a right button is only exposed if it is used. For instance, the Gamepad API makes use of this data minimization capability. It is impossible for a Web game to know if the user agent has access to gamepads, how many there are, what their capabilities are, etc. It is simply assumed that if the user wishes to interact with the game through the gamepad then she will know when to action it — and actioning it will provide the application with all the information that it needs to operate (but no more than that).
The way in which the functionality is supported for the mouse is simply by only providing information on the mouse’s behaviour when certain events take place. The approach is therefore to expose event handling (e.g., triggering on click, move, button press) as the sole interface to the device.
Two specifications that have minimized the data their features expose are:
-
[BATTERY-STATUS]
The user agent should not expose high precision readouts
-
[GENERIC-SENSOR]
Limit maximum sampling frequency
,Reduce accuracy
4.2. Default Privacy Settings
Users often do not change defaults, as a result, it is important that the default mode of a specification minimizes the amount, identifiability, and persistence of the data and identifiers exposed. This is particularly true if a protocol comes with flexible options so that it can be tailored to specific environments.
4.3. Explicit user mediation
If the security or privacy risk of a feature cannot otherwise be mitigated in a specification, optionally allowing an implementer to prompt a user may be the best mitigation possible, understanding it does not entirely remove the privacy risk. If the specification does not allow for the implementer to prompt, it may result in divergence implementations by different user agents as some user agents choose to implement more privacy-friendly version.
It is possible that the risk of a feature cannot be mitigated because the risk is endemic to the feature itself. For instance, [GEOLOCATION-API] reveals a user’s location intentionally; user agents generally gate access to the feature on a permission prompt which the user may choose to accept. This risk is also present and should be accounted for in features that expose personal data or identifiers.
Designing such prompts is difficult as is determining the duration that the permission should provide.
Often, the best prompt is one that is clearly tied to a user action, like the file picker, where in response to a user action, the file picker is brought up and a user gives access to a specific file to an individual site.
Generally speaking, the duration and timing of the prompt should be inversely proportional to the risk posed by the data exposed. In addition, the prompt should consider issues such as:
-
How should permission requests be scoped? Especially when requested by an embedded third party iframe?
-
Should persistence be based on the pair of top-level/embedded origins or a different scope?
-
How is it certain that the prompt is occurring in context of requiring the data and at a time that it is clear to the user why the prompt is occurring.
-
Explaining the implications of permission before prompting the user, in a way that is accessible and localized -- _who_ is asking, _what_ are they asking for, _why_ do they need it?
-
What happens if the user rejects the request at the time of the prompt or if the user later changes their mind and revokes access.
These prompts should also include considerations for what, if any, control a user has over their data after it has been shared with other parties. For example, are users able to determine what information was shared with other parties?
4.4. Explicitly restrict the feature to first party origins
As described in the "Third-Party Tracking" section, web pages mix first and third party content into a single application, which introduces the risk that third party content can misuse the same set of web features as first party content.
Authors should explicitly specify a feature’s scope of availability:
-
When a feature should be made available to embedded third parties -- and often first parties should be able to explicitly control that (using iframe attributes or feature policy)
-
Whether a feature should be available in the background or only in the top-most, visible tab.
-
Whether a feature should be available to offline service workers.
-
Whether events will be fired simultaneously
Third party access to a feature should be an optional implementation for conformance.
4.5. Secure Contexts
If the primary risk that you’ve identified in your specification is the threat posed by active network attacker, offering a feature to an insecure origin is the same as offering that feature to every origin because the attacker can inject frames and code at will. Requiring an encrypted and authenticated connection in order to use a feature can mitigate this kind of risk.
Secure contexts also protect against passive network attackers. For example, if a page uses the Geolocation API and sends the sensor-provided latitude and longitude back to the server over an insecure connection, then any passive network attacker can learn the user’s location, without any feasible path to detection by the user or others.
However, requiring a secure context is not sufficient to mitigate many privacy risks or even security risks from other threat actors than active network attackers.
4.6. Drop the feature
Possibly the simplest way to mitigate potential negative security or privacy impacts of a feature is to drop the feature, though you should keep in mind that some security or privacy risks may be removed or mitigated by adding features to the platform. Every feature in a specification should be seen as potentially adding security and/or privacy risk until proven otherwise. Discussing dropping the feature as a mitigation for security or privacy impacts is a helpful exercise as it helps illuminate the tradeoffs between the feature, whether it is exposing the minimum amount of data necessary, and other possible mitigations.
Consider also the cumulative effect of feature addition to the overall impression that users have that it is safe to visit a web page. Doing things that complicate users' understanding that it is safe to visit websites, or that complicate what users need to understand about the safety of the web (e.g., adding features that are less safe) reduces the ability of users to act based on that understanding of safety, or to act in ways that correctly reflect the safety that exists.
Every specification should seek to be as small as possible, even if only for the reasons of reducing and minimizing security/privacy attack surface(s). By doing so we can reduce the overall security and privacy attack surface of not only a particular feature, but of a module (related set of features), a specification, and the overall web platform.
Examples
-
Mozilla dropped devicelight, deviceproximity and userproximity events
4.7. Making a privacy impact assessment
Some features potentially supply sensitive data, and it is the responsibility of the end-developer, system owner, or manager to realize this and act accordingly in the design of their system. Some use may warrant conducting a privacy impact assessment, especially when data relating to individuals may be processed.
Specifications that include features that expose sensitive data should include recommendations that websites and applications adopting the API conduct a privacy impact assessment of the data that they collect.
A feature that does this is:
-
[GENERIC-SENSOR] advises to consider performing of a privacy impact assessment
Documenting these impacts is important for organizations although it should be noted that there are limitations to putting this onus on organizations. Research has shown that sites often do not comply with security/privacy requirements in specifications. For example, in [DOTY-GEOLOCATION], it was found that none of the studied websites informed users of their privacy practices before the site prompted for location.
Acknowledgements
Many thanks to Alice Boxhall, Alex Russell, Anne van Kesteren, Chris Cunningham, Coralie Mercier, Corentin Wallez, David Baron, Domenic Denicola, Dominic Battre, Jeffrey Yasskin, Jeremy Roman, Jonathan Kingston, Marcos Caceres, Marijn Kruisselbrink, Mark Nottingham, Martin Thomson, Michael(tm) Smith, Mike Perry, Nick Doty, Robert Linder, Piotr Bialecki, Samuel Weiler, Tantek Çelik, Thomas Steiner, Wendy Seltzer, and the many current and former participants in PING and the TAG for their contributions to this document.
Special thanks to Rakina Zata Amni for her edits which help spec authors take the bfcache into account.
Mike West wrote the initial version of this document and edited it for a number of years. Yan Zhu took over from Mike and, in turn, Jason Novak and Lukasz Olejnik took it over from her. The current editors are indebted to all of their hard work. We hope we haven’t made it (much) worse.