<- The World Wide Web Table of Content JonDonym ->

Data Collection Techniques

The actual techniques employed by the data miners on the Web are briefly introduced below.

Cookies

Cookies are used to identify and remember a web surfer. Without cookies, certain services would be complicated to realize. If a user requests a page from a webserver, it cannot readily match requests of previous pages requested from this server to that same user. HTTP is a stateless protocol.

Nevertheless, some services require a sort of memory. Shopping portals are an example: a server has to remember what goods were placed into the virtual shopping cart. This "memory" is usually written into cookies, i.e. small text files which are being sent to you by the server upon every page request. When your browser contacts the server again, it also automatically sends back the cookie stored earlier. The server thereby allocates the right shopping cart to you.

But cookies can also be abused to track your steps on the Internet. This works exceptionally well with web portals (e.g. Yahoo) and search engines (e.g. Google) for you use these a lot in order to reach other websites. With cookies, a web host can record large parts of your surfing behavior over years and easily relate it to you as a person with your "accumulated" profile data. Most Internet users have collected hundreds of cookies from various websites on their PC without their knowledge.

The following example shows you just a small amount of cookies you get if you request www.nytimes.com:

Cookies set by nytimes.com

Websites embedding several external ad and tracking services is nothing unusal. A study by University Berkeley done in 2012 has analyzed the top 100 websites and found 6,485 cookies. Thereby 5,493 cookies were set by third parties, i.e. not by the website the user visited intentionally. While surfing on these 100 websites data to 600 servers was transmitted with the help of those cookies.

Disable cookies from third parties will reduce tracking cookies but does not not block it at all. Sophisticated surf tracker like Yahoo! Web Analytics and others are able to set tracking cookies in first-party context. In our blog article First-Party Cookies you may find some examples of first party cookies used by third party tracking services. Two techniques are mainly used:

At least you should delete all cookies after your browser session. To avoid tracking of your browsing habits we recommend the blocking of all cookies and Javascript. Enable session cookies or Javascript only temporäry for trusted websites if required to get it working as expected.

Flash cookies

Flash cookies (LSOs) are deployed since 2005 to recover deleted cookies with the same identification mark. Flash cookies are stored by the Flash player independent of your browser and outside of the cookie management of your browsers.

Clearspring Technologies Inc. has been using this technique successfully (until it got sued in 2010) and promotes its precise data of 200 million Internet users. Ebay is using Flash cookies too.

There is an overall downward trend in the use of Flash cookies. Websites may be changing strategies here by using Evercookies.

Evercookies

More than 80% of the users disapprove of tracking while surfing the web. Many surfers use browser settings which prevent a long-term tracking. Therefore, ad and tracking networks are moving on to use more sophisticated methods to distinguish each user. Samy Kamkar shows with evercookie - never forget an overview about possible methods to mark Internet users individually. Some examples for "cookieless cookies":

In a study of the University of California, Berkeley the methods of Space Pencil, Inc., aka KISSmetrics, were exposed which, in addition to cookies and flash cookies, used cache cookies via ETags, DOMStorage and IE-userData in order to distinguish each user.

The usage of Evercookies is rising across popular websites. By results of the Web Privacy Census in 2011 only 19% of popular websites were using EverCookies. In 2012 (one year later) 38% of popular websites are using HTML5 local storage for identification marks.

Fingerprinting of your Browser

Fingerprinting can be used for tracking without setting an identification mark like cookies or EverCookies in your browser. The project Panopticlick of EFF.org demonstrated, how browser fingerprinting works. More than 80% of surfers are traceble by an unique browser fingerprint. The recognition rate increases to 94% if it was possible to collect additional information by Flash or Java applets. The study Dusting the web for fingerprinters of KU Leuven (Belgium) estimated the usage of fingerprinting "in the wild" and discovered Javascript tracking scipts and Flash applets.

The collection of information for fingerprinting may be done by following methods, arbitrary combinations are possible:

Examples for usage of browser fingerprinting for tracking purposes:

Browser Cache

From the contents of your browser cache one can conclude on previously visited, thus already cached, websites. Together with every website an ETag is send by the server and stored in the browser cache. If the website was called again, the Etag is send first to ask for changes. This tag may contain an unique user ID. KISSmetrics was using ETags in this way to identify visitors of some TOP100 websites.

Additionally, the time required for loading a website changes when part of it is already in the browser cache. By subtle placement of the images on the website, the server can analyze the cache one by one.

Deactivating your cache would have tremendous effect on your surfing speed, which is why we don't recommend it. In JonDoFox a protective mechanism has been integrated instead which bypasses cache for third party content. Also, the cache is deleted automatically when you close the browser. A website can thus no longer gain information about other websites, only about itself.

Referer

A given web page 's referer is the URL of whatever web page contains the link that the user followed to the current page. In case of ads, HTML bugs or like buttons it is the URL of the current page.

In the paper Privacy leakage vs. Protection measures (PDF) more than 100 popular websites were examined. Many websites leaked private information pieces by referer to third parties. 56% of tested websites leaked information like email address, real name or place of domicile to tracking services after login. A small example shows the call of a Doubleclick banner after login at the website http://sports.com GET http://ad.doubleclick.net/adj/....
Referer: http://submit.sports.com/...?email=name@email.com
Cookie: id=123456789.....

A unique tracking cookie is used and it was possible to connect the email address with the collected tracking data.

To avoid the leakage of privat information pieces and the usage of referer for tracking purposes the JonDoFox removes all parameters from the referer and clear the referer at all if the domain was changed.

Danger of JavaScript

Using JavaScript it is possible to set identification marks (EverCookies, "cookieless cookies") or access many information about your browser, your desktop settings and your hardware. All these information may be accumulated to an individual fingerprint of your browser. By this fingerprint a user may be recognized. The IP Check shows only some examples of values which may be gathered (JavaScript needs to be enabled). It demonstrates the labeling of users by EverCookies created with JavaScript, too.

It is possible to compromise your browser or operating system using software bugs or a bad designed website. An attacker can e.g. inject malicious JavaScript code by Cross Site Scripting and thus try phishing for login creditials, bank accounts or other sensitive data.

Therefore, we recommend you to only activate JavaScript contents if needed and to block them otherwise.

Dangerous Plug-ins

Webcontent accessible by browser plug-ins such as Flash, Java, ActiveX and Silverlight renders the Web more dynamic and colorful but also more dangerous, for they allow websites to execute code on your PC. If executed, these plugin contents are able to read some details about your computer and network configuration and send it to the web server. By certain manipulations they moreover can read and edit files on your machine and in an extreme case even gain complete control over it. It is possible for plug-ins to circuumvent proxy settings of your browser for JonDonym or Tor usage and it can leak your real IP address.

Especially beware signed Java applets: by accepting its signature, the applet, and thereby the visited webserver, automatically receives all user rights on your machine. In particular, it may then read your IP address, your MAC address and even hard disk contents. It does not help to only surf websites you deem trustworthy either. This concept is outdated since nowadays even numerous large and notorious websites are being hacked and filled with malicious code. The hacking suite for governmental interception by HackingTeam is using signed Java code for deployment on Windows, MacOS, Linux and smartphones.

Only deactivating these plugin contents provides real security. Websites which demand usage of active plug-ins should be avoided if possible. YouTube videos and videos of other such portals which are rendered by Flash may be downloaded and then viewed safely with a video player.

Browser History

A publication of the from the University of California provide an analyses of the TOP 50.000 websites. 1% of these websites collect informations about web surfers by history sniffing. Using malicious JavaScript code and CSS hacks informations about visited websites were collected. Webmasters who are not familary with sniffing technologies can use services like Tealium or Beencounter for real-time history sniffing.

Collected informations are not only used for advertisments. It can be uses for de-anonymisation of surfers too. A publication of Isec shows a possible way. Using the browser history the visited groups of the social network Xing were collected. Because there are not two people members of the same groups in a social network it was possible to get the real names and e-mail addresses.

Recent versions of popular browsers like Firefox are protected against history sniffing.

Webbugs and Banner Ads

Very likely, you will find one or more cookies in your browser from data miners such as doubleclick.com, advertisement.com or Google, although you have never even visited their websites. This is due to the fact that these enterprises use, on other web sites, a simple trick to nevertheless plant cookies on you and watch your browsing: Webbugs.

"Webbugs" are usually pictures of 1x1 pixels and therefore invisible to the viewer. However, they can also be coded into banner ads embedded in a website. The website contains a picture (webbug) that is loaded from another server running a statistics service (such as Doubleclick, Google Analytics). Thereby the statistics service may set or edit a cookie in your browser unnoticeably. The browser will then send this cookie back to the statistics service with every new request for a site where any webbug of this service is embedded. If the service is used on many different websites, it can now track large parts of your browsing session. If the owner of the statistics service moreover collaborates with the owner of your preferred search engine, he gets an almost complete picture of your Internet activities.

The privacy functions of most current browsers that either flatly deny cookies or only deny third party cookies, and alternatively also delete all cookie data when closing the browser, do not achieve optimal protection. To prevent session tracking, all cookies should be blocked by default if possible and only allowed in if needed for the duration of the session. JonDoFox is therefore preconfigured to deny all cookies but allow single websites at the expense of two mouse clicks. We recommend allowing cookies only on a temporary basis, so that they will be automatically blocked again after the session.

Another nasty feature of webbugs is, that they send, besides cookies, also your IP address to the statistics service upon request. Even with a very good browser configuration, by switching off cookies and by using webbug filters, you are never able to reliably prevent this. The only effective protection against this are anonymisation services like JonDonym.

TCP Timestamps

The Transmission Control Protocol (TCP) is a protocol for transferring data between computers. It is necessary for using Internet services like http (WWW), smtp (E-Mail) and ftp. When your computer sends a request for a web site, for example, this data is sent within many small so-called TCP packets. Besides that request data, such a TCP packet also contains some optional information fields (optional headers). One of those options is the TCP timestamp. The value of this timestamp is proportional to the current time of your computer and is incremented according to your computer's internal clock.

The timestamp may be used by the client and/or server machine for performance optimization. However, an Internet server may recognize and track your computer by observing those timestamps: By measuring the clock skew of the timestamps, it may calculate an individual clock skew profile for your computer. Moreover, it may estimate the time when your machine was last booted. These tricks work even if you have otherwise perfectly anonymised your Internet connections.

If you were using JonDonym, you are however protected against being observed this way: The JonDonym mixes automatically replace your potentially insecure TCP packets by their own.

IP Address

The IP address is given to you by your provider on dialing into the Internet. The provider usually saves it for months or even years together with your customer data and your online time. It is your distinct identifier on the Internet which is sent along whenever you make a direct connection to any Internet service. The IP address tells the server where to send his response. As long as your IP does not change, it is easy to monitor when and what website you have contacted. The IP also reveals your provider, many times your location and sometimes (in case of a company or computer center) even what terminal you are on. In many cases, an IP address relates directly to one person.

All that your IP-address is revealing:

Some of the information that is given away by your IP or browser can be reviewed on the JonDos test page.

While the traces mentioned so far can be blurred without any special services needed, the same cannot be said about your IP address. That is why the software JonDo has been developed: In order to blur any connection between your IP and the websites you visit, JonDo connects to the service JonDonym. This service then interlaces the servers of different organizations with your PC and the Internet. You are now surfing with the IP of the respectively last server within a chain/cascade of a few so-called mix servers.

MAC Address

The MAC address (MAC=Media-Access-Control, sometimes also called Ethernet-ID, Airport-ID or physical address) is the hardware address of each individual network device. Each computer may have several of such physical or virtual network devices (bound to a cable (LAN), wireless (WLAN), mobile (GPRS, UMTS), virtual (VPS), ...). The MAC address serves as a unique identifier for the respective device in a local area network. On the Internet, it is neither used nor transmitted. Also, your access provider may only see it if your computer is not connected to the Internet over a router, but directly, for example by a modem. You may moreover change the MAC address yourself.

 

<- The World Wide Web Table of Content JonDonym ->