Have you ever gone to socially share or email a URL and found that it was much longer than you had expected? Take the following contrived URL as an example:
In your personal experience, as in our example, you might have realized that the URL was as much about you, the client, as it was about the Web resource you were trying to access. Indeed, Internet addresses may contain a wealth of information about the identities and activities of the users visiting them. URLs often utilize query strings (i.e., key-value pairs appended to the URL path; in our example, everything after the question mark) as a means to pass session parameters and form data. While sometimes benign and necessary to render the Web page, query strings often contain tracking mechanisms, user names, email addresses, and other information that users may not wish to publicly reveal. In isolation this is not particularly problematic, but the growth of Web 2.0 platforms such as social networks and micro-blogging means such URLs are increasingly being publicly broadcast.
Andrew G. West, a Research Scientist in Verisign Labs, along with collaborator and U.S. Naval Academy professor Adam J. Aviv examined nearly 900 million user-submitted URLs to gauge the prevalence and severity of such privacy leaks. Within the corpus they found troves of personal information. Almost 55 percent of URLs have a query string, and of those, 53 percent disclose referrer data (i.e., how you got to the page) with at least 2.7 percent having more acute privacy ramifications. For example, 1.7 million email addresses were found in the data, but the most egregious incidents were the several dozen cases where query strings contained usernames and passwords for administrative and sensitive accounts in *plain-text.* The study also found that mobile devices contribute an atypically significant portion of the problem space, perhaps because small screen sizes and difficult input mechanisms prevent users from observing and manually eliminating private data.
With this as motivation, the researchers propose the development of a privacy-aware URL sanitization service named “CleanURL.” The goal of the proposal is to transform input addresses by stripping non-essential key-value pairs and/or notifying users when sensitive data is critical to proper page rendering. Such a system could be user-facing, transparently built into online platforms, and/or retroactively applied to existing links. Regardless, the goal of this research and the proposed system echoes one of Verisign: Increasing the safety and security of the Internet for corporations and individuals alike.
This research was initially published at the 8th Workshop on Web 2.0 Security and Privacy (W2SP 2014), and an expanded journal version is currently in submission.
Read the full report, On the Privacy Concerns of URL Query Strings [PDF].