Monday, January 9, 2017

Your personal data in a box!

By Hamed Haddadi @realhamed

Today's personal data ecosystem is in a fragile state. A large number of smartphone apps [1], third party trackers [2], or social media collect and aggregate personal information in order to provide location-based services or targeted advertising. This comes with several costs, including our privacy, energy, and bandwidth use. However, any attempts to reduce these costs that are in opposition to the basic economics of so many Internet services are unlikely to succeed. Likewise, personal data vaults and information silos are bound to become a target for security attacks.

In a new EPSRC-funded project, we are building the Databox [3], a personal networked device (and associated services) that collates and mediates access to personal and IoT data, allowing us to recover control of our online lives. The Databox is a first step to re-balancing power between us, the data subjects, and the corporations that collect and use our data.

In the Databox Project, starting from October 2016, the researchers are creating a hardware platform and an open source suite of software for bringing third party apps to personal data, without jeopardising individuals' privacy. This can be achieved by performing analytics over encrypted data [4] a first level of aggregation and data analysis at the user end, or in a distributed manner over the boxes in the community [5]. While the Databox is not a data silo, it allows the user to interact with their data using concepts inherited from the Human-Data Interaction framework, and install and verified authorised third party apps which have isolated and controlled access to different data sources.

One might think, what would we put our data together? What's in it for the user? Where's the business model? The way to think about it is the Android or iOS ecosystem, where the inherent value lies within the apps and the data, while in the Databox model, users might actually be tempted to pay for an app that does, say for example, income and expenditure analysis for getting the best mortgage, or mental and physical health analysis without giving up all their personal data and smartphone battery life. The app ecosystem is only limited by the developers' imagination and the users' needs.

Research in this space is the first step to fighting the privacy battle. The complex regulatory aspects over acquisition and trade of personal data, and various geographical jurisdictions surrounding [or lack there of] personal data all make for a challenging and bumpy road ahead. What is certain, is that the current wild-west nature of personal data can not continue for much longer.

[1] How Private Are Health-Tracking Apps on Your Phone?,

[2] The Murky World of Third Party Web Tracking,

[3] Hamed Haddadi, Heidi Howard, Amir Chaudhry, Jon Crowcroft, Anil Madhavapeddy, Derek McAuley, Richard Mortier, "Personal Data: Thinking Inside the Box”, The 5th decennial Aarhus conference (Aarhus 2015), August 2015

[4] Wang, Frank, James Mickens, Nickolai Zeldovich, and Vinod Vaikuntanathan. "Sieve: cryptographically enforced access control for user data in untrusted clouds." In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), 2016.

[5] Hamed Haddadi, Richard Mortier, Steven Hand, Ian Brown, Eiko Yoneki, Derek McAuley and Jon Crowcroft: “Privacy Analytics”. ACM SIGCOMM Computer Communication Review, April 2012.

Tuesday, September 8, 2015

Title: Privacy-Preserving Personal Mobile Databox

POST BY Mateus Felipe Eisenkraemer

It has been almost two years after the world was shocked with the surveillance scandal made by the NSA agency, where according to The Guardian ( ), “Personal data of citizens was intercepted indiscriminately”. Certainty, this scandal led to a global awareness of how valuable our personal information is and how badly it can be misused. 

Unfortunately, there are very few studies and tools that focus into providing alternatives for keeping our private data safe, while still not completely disrupting with the current targeted ad campaigns or giving breach for terrorism threats.

Our online interactions seem to be increasing exponentially in the past few years. With more devices and services being used every day, it is really hard to be aware how much data we generate every day and an even harder to know how our data is being used by others.
Considering this constraints, we propose an Android mobile app that enables people to engage with the collection and management of their own personal data. The platform can be referred as a Databox, and will be situated on the users own smart phone with all the gathered data available there. The main reason why all the data will be persisted just on the user own device is because there are a range of privacy threats that arise due to, for example, storing all this data about us in a third party website or cloud service.

The data gathered and stored in the Databox is the following:
Online Profile: Facebook profile information, as name, gender, locale, email and age.
·        Individual: Personal location history using the device GPS system.
·        Online social Network Sentiment analysis: A sentiment analysis over the user own posts on Twitter.
·        Online social Network Trend analysis: A daily trend analysis over the posts made by the user and their connections in Twitter and Instagram.
·        Health: Total number of steps made by the user in each day.

The current objective of the Databox is to generate user awareness and control over its own online generated data. But we foresee a wider spectrum that this application could fit in. We plan to provide the users of the app with the ability to choose certain pieces of data to be made available to third parties as a form of payment for a service or simply appreciation. This could allow many different types of interaction, as the Databox being a new type of currency available for us to use.
The targeted ad campaigns models by the analysis of personal data could also be benefited by the use of Databox, in a much less invasive way. Media companies and end users could agree upon which and how each data piece would be used. Always allowing each of us to be in total control of our online generated data and deciding who should have access to it.

The Databox project is open source and currently available at, all contributions are greatly appreciated.

Wednesday, July 8, 2015

Our response to VPN providers' request & reactions

We have had a lot of interest in some of our recent work: “A Glance through the VPN Looking Glass: IPv6 Leakage and DNS Hijacking in Commercial VPN Clients”, which has just been published at the Privacy Enhancing Technologies Symposium in Philadelphia. We’re delighted this has made a positive impact, with many VPNs announcing fixes to deal with the issues we raised and notified them all months before publishing the paper. However, we wanted to add a few extra comments in light of the statements made in recent days.

A few of the VPN providers say that our paper is out-of-date and they have deployed fixes. This is great news to hear, and something that we were obviously hoping for. We were fastidious about contacting all VPN providers before publication to give them the opportunity to deploy their fixes before the results were made public. In many cases, the VPN providers have explicitly acknowledged us as being responsible for them deploying the fix. Upon contacting them, some of the companies ignored our mail. We therefore refute claims that we were irresponsible in the disclosure of this work - any VPNs who remain vulnerable are solely responsible for their own infrastructure and the privacy and security of their customers.

We’ve been contacted by several VPNs that we did not include in the study. Further, some of the VPNs that we did study have dismissed our findings, stating that they are not vulnerable. Whereas some have now deployed fixes, we are exceedingly confident in the veracity of the experiments prior to this. We also want to thank the VPNs that came out and acknowledged our work as being accurate, rather than denying it. One way in which companies have claimed our experiments are wrong is by saying that the number of exit points we reported were incorrect. This is misleading. We were explicit in saying that these numbers were those observed by our vantage point. We never claimed to have a comprehensive view over all their exit points. Further, this in no way relates to the security attacks discussed, so it is rather irrelevant. Sadly, we cannot continue to repeatedly test all the VPNs contacting us as we simply don’t have the manpower. However, we encourage all VPNs to repeat the tests detailed in the paper, and to deploy fixes if the vulnerabilities are found.

Lastly, we want to emphasise the key point of our study: VPNs are not designed to give comprehensive anonymity. We were glad to see that some of companies were very candid about this, and publicly spoke of the benefits of Tor and the limitations of anonymity capabilities offered by a VPN. If you are genuinely concerned about your online privacy, then a service such as Tor is more appropriate. If you simply want to gain an IP address in another country then, sure, a VPN is perfectly suitable.


Vasile Claudiu Perta, Marco Valerio Barbera, Gareth Tyson, Hamed Haddadi, Alessandro Mei, "A Glance through the VPN Looking Glass: IPv6 Leakage and DNS Hijacking in Commercial VPN clients”,  The 15th Privacy Enhancing Technologies Symposium (PETS 2015), June 30 – July 2, 2015, Philadelphia, PA, USA (paper, HackerNews, The Register, Tech times).

Wednesday, March 11, 2015

How a Box Could Solve the Personal Data Conundrum

The MIT Technology Review has a  nice article on our new Databox paper, and it was also followed by coverage in the Guardian. The idea is to be able to index all your personal data, ready for cross-correlations and research! Please had a read:

Hamed Haddadi, Heidi Howard, Amir Chaudhry, Jon Crowcroft, Anil Madhavapeddy, Richard Mortier, "Personal Data: Thinking Inside the Box”,  January 2015, available on arXiv [paper , MIT Technology Review, Guardian]

A Glance through the VPN Looking Glass: IPv6 Leakage and DNS Hijacking in Commercial VPN clients

Have you ever wondered how individuals' in countries with restricted Internet use services such as Facebook and Twitter? Are these users safe from their governments' ability to monitor their browsing behaviour? In many such places, Commercial Virtual Private Network (VPN) services have become a popular and convenient way for users seeking privacy and anonymity. They have been applied to a wide range of use cases, with commercial providers often making bold claims regarding their ability to fulfil each of these needs, e.g., censorship circumvention, anonymity and protection from monitoring and tracking.

In our new paper, to appear in The 15th Privacy Enhancing Technologies Symposium (PETS 2015),  we investigated the claims of privacy and anonymity in commercial VPN services. We analyse 14 of the most popular ones, inspecting their internals and their infrastructures. To our surprise, and despite being a known issue, our experimental study reveals that the majority of VPN services suffer from IPv6 traffic leakage.

IPv6 is an increasingly popular web access method being adopted worldwide. Hence, our paper highlights that people using these VPN services may actually have their web browsing habits leaked to any organisation monitoring them. Perhaps most concerning is the unfounded common belief that these VPN services are actually securely hiding users' web browsing activities. We have informed all of these VPN providers about this study and our findings, and we hope they will address this issue immediately.

Vasile Claudiu Perta, Marco Valerio Barbera, Gareth Tyson, Hamed Haddadi, Alessandro Mei, "A Glance through the VPN Looking Glass: IPv6 Leakage and DNS Hijacking in Commercial VPN clients”,  The 15th Privacy Enhancing Technologies Symposium (PETS 2015), June 30 – July 2, 2015, Philadelphia, PA, USA (paper)            

Tuesday, November 11, 2014

WeChat – A good example of bad API security!

In last few days Muhammad Haris was looking into network traffic data of mobile applications for related research on mobile privacy and security. However, in the process, we figure out shocking vulnerability in very famous WeChat messenger application. First, for those of you who never heard of Wechat: it is a primary messenger app in the Asian region specifically China, used by half a billion of individuals. 

This app has one important feature named moments. By this feature users can share photos and status with their friends only.  For the sake of privacy, photos and status shared by your friends are only visible to you in your moment (not the default friends-of-friends like in Facebook wall). Similarly there is another feature of album; which lets you see all pictures and status shared by a particular friend. You can visit your friends’ album through their profile. Important point to note here is that official explanation by WeChat mentioned that things shared on moments (and album) are visible to your friends only.  However it seems like backend communicationa between the WeChat application and its server suffer from serious security flaws. Let me take you there step by step.

To capture mobile application data Haris used fiddler which is very nice proxy; should you wish to use fiddler to capture your mobile data you can follow this nice tutorial by Troy Hunt

If you visit WeChat moments in phone, all the pictures visible in your moments are being loaded over HTTP. Hence by just looking at the mobile traffic data from fiddler you can get all the moments' pictures.  Picture:1 shows Haris's WeChat moments, his friend has shared some photos of his hiking trip. 
Now in the picture:2 as we browse over moments, the highlighted packets are sent by WeChat server to the application in the phone. Notice that protocol used for these packets is HTTP, which means photo data is not sent over secured HTTPS channel. 

In picture:3 and picture:4 we see selected individual packets, you can see the same pictures of his friend which are in the moments here.

This simply means that if a WiFi provider collect the raw packets on the channel from mobile devices, they can easily get all the photos of your friends as soon as you visits moments on your WeChat. As a result on one hand personal photos (of you and your friends) can be leaked and on the other hand a malicious WiFi provider can also infer your social links by looking at the pictures. This just illustrates the point of how harmful a poorly implemented API (application programming interface) security can be!

In comparison with other OSNs, Instagram seems to be still suffering form similar security issues, but in case of WeChat there is no need to hijack any sessions, it's all open by default! Google offered full encryption as an option for Gmail in 2008, but two years later made it the default. Facebook switched it on by default in January 2011. 

Saturday, July 5, 2014

Privacy-preserving Adsense Systems Using Delay Tolerant Networking

an undergrad student of mine did this work based on our research on MobiAd, which I found pretty impressive!

With the ever-increasing number of smart phones, a growing num- bers of people view advertisements on their phones and hence the smart phone advertising market has become rich and noticeable. To raise click-through rate and maximize profit, ad brokers ensure their ads are more personalized and targeted. Therefore, they col- lect personal information to build an accurate user profile. The use of sensitive and personal information may raise privacy concerns. In this paper we focus using Delay Tolerant Networking (DTN) to anonymize click reports, aiming to stop attackers tracking and identifying users based on behaviour and location. The results of our simulations prove that a few-hop DTN-based system can protect users’ identity and privacy while not heavily increasing their energy costs.