Monday, March 12, 2007

The Dark side of the Internet

This article is about security and privacy issues that the new generation of Internet poses to individuals, groups and societies at large. For some strange reason, the power, benefits and the positive aspects of the Internet are very widely known, but there seem to be very little public awareness - especially among the average users of the Internet - about some of the dark elements and inherent risks in making all information publicly available on the Internet. The Internet itself has changed drastically over the last few years, and it is evolving very rapidly. There is a tremendous difference between the first generation Internet and the new semantic web which is evolving - both in terms of its power, and the risks it poses if we are not careful of how we use it.

I will first introduce very briefly the three different phases of the Internet in the first section of this article, and later discuss the various privacy and security issues connected with these different phases.

Web 1.0: Web 1.0 was the first generation of the Web. During this phase the focus was primarily on building the Web, making it accessible, and commercializing it for the first time. Key areas of interest centered on protocols such as HTTP, open standard markup languages such as HTML and XML, Internet access through ISPs, the first Web browsers, Web development platforms and tools, Web-centric software languages such as Java and Javascript, the creation of Web sites, the commercialization of the Web and Web business models, and the growth of key portals on the Web.




Web 2.0: According to the Wikipedia, "Web 2.0, a phrase coined by O'Reilly Media in 2004, refers to a supposed second generation of Internet-based services — such as social networking sites, wikis, communication tools, and folksonomies — that emphasize online collaboration and sharing among users."

I would also add to this definition another trend that has been a major factor in Web 2.0 — the emergence of the mobile Internet and mobile devices (including camera phones) as a major new platform driving the adoption and growth of the Web, particularly outside of the United States.

Web 3.0: Using the same pattern as the above Wikipedia definition, Web 3.0 could be defined as: "Web 3.0, a phrase coined by John Markoff of the New York Times in 2006, refers to a supposed third generation of Internet-based services that collectively comprise what might be called 'the intelligent Web' — such as those using semantic web, microformats, natural language search, data-mining, machine learning, recommendation agents, and artificial intelligence technologies — which emphasize machine-facilitated understanding of information in order to provide a more productive and intuitive user experience."

The key here - in the Web3.0 - is the "machine-facilitated understanding of information". The word "semantics" means understanding the essence of something. In Web3.0, this understanding is accomplished by the machines. This is a tremendous advantage as well as potentially very dangerous and very risky - because, ultimately machines are controlled and operated by humans.

The risk comes from the following:

• Our conversations are no longer ephemeral
• No individual is an isolated dot any more – we are all part of a connected scheme.
• Machine facilitated understanding of information (or Semantic Modeling) makes it possible to discover information and relationships that were previously impossible.
• The Internet allows us to drop our inhibitions - and many people "upload" very personal information about themselves, their lives, their families and friends. This information can potentially be used to generate a complete psychological profile of people. Such information can be potentially be misused by governments, by corporations, by mafia and other dark elements.
• You may buy your computer – but you don’t own it any more. Spyware, Malware, Corporate, governmental espionage

Our Conversations are not ephemeral
Imagine you are traveling in a taxi, a bus or a train with your spouse or a colleague or a friend. Generally, in such situations we talk, and we talk about personal and intimate subjects - forgetting that people around us can hear our conversations. You may probably discuss about a weekend vacation, or a travel plan with your partner. When you come back home after your vacation - you discover that your house was robbed over the weekend.

These things have happened in the past, and they still happen. People talk very private and intimate things in public and sometimes they face unpleasant consequences of such 'inadvertent' public conversations. But, in general, in the past - everyday conversations - whether face-to-face, conversations in public - evaporated as soon as they were said. You could be reasonably sure - that your talk simply vanished in the ether, no trace was left. Of course, organized crime bosses in Movies used to worry about their phones being tapped - but that was an exception. Privacy was the default assumption.

Well, this is no longer true. Our conversations on the Internet - emails we send, our messages on social networking sites like Orkut, Myspaces, the comments we leave on our friends' blogs, the blogs we write, the chat programs we use - all of them are stored, analyzed and recorded. They are stored permanently - for ever. Nothing can be deleted from the Internet - at least not by us.

Every single thing you type on your keyboard - is stored. We know this intellectually, but we never internalized it properly. There are tools that allow both businesses and government agencies to monitor and log IM conversations. E-mail can be saved by your ISP or by the IT department in your corporation. Gmail, for example, saves everything, even if you delete it. More over, these conversations are saved by many different organizations - not just one. Your employer may be storing everything you type on the net, your ISP may be storing everything you type, governments and defense organizations may be storing everything you type. We don't even know who else may be storing our conversations and our messages. We don't know who they are, and we certainly do not know what they do with the information.

There are no laws that regulate who can 'tap' into the information that we send over the Internet. Phone tapping is illegal in most countries - even if the police want to tap some one's phone, they still need a legal order from the magistrate. But, on the Internet, anyone can store anything. There is simply no law and no legislation.

The implications of this to personal safety, liberty and freedom are enormous. You may lose your job - because your employer finds out that you are spending too much time on the Internet and they can now 'prove' it. There may be criminal prosecutions that one has to face, divorce proceedings or simply one may have to face some unpleasant public embarrassment. Such things have happened already - former U.S. Rep. Mark Foley sent salacious instant messages to a young boy and found himself arrested very soon. IBM fired an employee because he was found to be chatting in internet chat rooms. Many people were bullied, kidnapped and robbed because of the information they keep on the internet.

If you find this disturbing, you should. Fewer conversations are ephemeral, and we're losing control over the data. We trust our ISPs, employers and cellphone companies with our privacy, but again and again they've proven they can't be trusted. Identity thieves routinely gain access to these repositories of our information. Paris Hilton and other celebrities have been the victims of hackers breaking into their cellphone providers' networks.

If you don't find this disturbing - read on. Ask yourself what information about you is already on the Internet, and how much information you have kept already. Do you have a blog? Do you have an email-id? Do you leave comments on other's blogs? Do you have your photograph on the Internet somewhere? Do you have membership in some news groups? Do you have an account in social network sites like Orkut? Do your friends write public 'scrapes' in your Orkut profile? Do you regularly chat on the Internet? Do you have am online access to your bank account? Do you buy anything on Internet using your credit card? That's a lot of data that you are putting on Internet, and most of this information is very personal and can be misused.

Not convinced yet? Here are some examples:

I have a blog in my local language - Telugu. The number of Telugu bloggers are still very few - by the Internet standards they don't even count. Because they are few, they are all friends and they know each other very well. Some of these bloggers have 'pseudo-names' and they don't keep their photographs and any other information on their profile pages. But, it is very easy to find out who they are, where they live and what companies they work for. For example, if they visit by blog - by looking at the IP-address records, I can figure out where they are from, where they live and their ISP or company name. Well, the IP-statistics still do not tell you the name of the person. But, suppose if they leave a comment on my blog - then it is very easy to correlate the IP-statistics with the person. I still do not know their name (if they choose a pseudo-name), but I know where they live, and which companies they work for. Once you figured out their IP-address, you can then generate lot of statistics about them - like their usual 'Internet Visiting Hours', whether they browse from home or from work, how long they usually spend time on the Internet, what are their favorite blogs are and so on. It takes only a couple of minutes for an average human being who does not posses any powerful tools like governments, crime syndicates, corporations and defense departments. Imagine how much information about you can be learned by those 'control' minded groups?

"You are innocent until proven guilty" was the old maxim - with the new 'security madness' by governments all over the world - you are guilty until proven innocent is the new heuristic.

The moral is clear: If you type it and send it, prepare to explain it in public later.

This was web1.0. Now, enter web2.0:

No individual is an isolated dot any more – we are all part of a connected scheme

I had a friend - we lost touch with each other for a long time, almost 15 years. I don't know much about her personal details - since we were mostly pen friends. All I knew about her was her name, her family name, her brother's name and her husband's name - and where they were from. I don't even know where all of them are living now.

Since she was a very good friend of mine - I tried several times to find out where she is and how she is, but couldn't locate her - until the social networking sites like Orkut came. Let's say her brother's name is Suresh, and her husband's name is Adarsh. I searched for Suresh in Orkut, and his 'native city' together. I find quite a few people by that name. How do I know whether this particular Suresh I was looking for is one of them or not? All I do is look whether any of these people have a friend called Adarsh. Bingo -- they are there. It took ten minutes to locate them. Rest is easy - you make a contact, get the phone number and call her.

I called her - after the initial joy and excitement of meeting an old friend was over, she suddenly remembered that it would be impossible for me to find her phone number. So, she asked how I got her numbers.

I told her that I found it on the Internet. She was surprised. I told her that I also happen to know her residence address, what her husband is doing, that they recently went for a vacation to Chennai, her in-laws live in such and such a place, that she had a five year old son and that the boy had some health problems recently, that she is doing a part time job and that they bought a computer recently and so on and so forth. BTW, she never touched a computer in her life, and she doesn't even have an email address.

She was totally dumbfounded and flabbergasted. She asked whether I work for CBI. Well, I don't work any intelligence agency. All the information was there on Orkut - for anyone to see. It is surprising how much information we keep unintentionally on the net.

The point here is something even more serious. Today, our presence on the net is not isolated - we are - by our own volition - connected to almost everyone else in the world. We expose our friends, colleagues, associates, classmates, relatives, family and everyone else we know. We don’t stop with that - we also expose how we came to know each of them, we tell the entire world what our interests are, which causes motivate us. How much more can you advertise about yourself to the world?

If her husband and brother were not linked together on the net, it would have been very difficult for me to locate them. It is the 'connectedness' that allows even an average person to quickly find out so much information - even about someone who does not have any internet presence.

We have several such social networking sites - there are social networking sites like Orkut, there are professional networking sites like Linked-IN, there are specialty networking sites and so on. We have a presence in each of them. And, we are connected with so many of our friends and colleagues through these networks. When I log on to Orkut - I get a message right at the top - "you are connected to 39million people through 40 friends". The question is do I want to be connected to 39 million people?

There is an interesting application developed by University of Virginia called Oracle of Bacon. This application can connect any movie actor from any country - past or present to the Hollywood actor Kevin Bacon with in 8 steps. It can connect somebody like S.V.Rangarao with Kevin Bacon. Surprised? How can S.V. Ranga Rao - a telugu actor who acted mostly in telugu movies be connected to Kevin Bacon? Well, he is - and he is connected with in 6 steps.

There is a theory that anyone in the world can be connected to anyone else in the world with in a few steps. This is called 'semantic' distance between two people. Using Orkut and other social networking sites - you are the Kevin Bacon. You can be connected to anyone in the world. What is our distance to the most wanted man in the world? May be six or seven? We don't know - all we know is we are connected. Suppose, if some idiot from a government agency decides to 'round-up' everyone who is connected with the infamous B.L of A.Q fame - we might be on the list. All because, you have some forty friends on Orkut and you don't know most of them.

Is this information available to governments, mafia, drug-traffickers, terrorists, extortion agents, thieves and all other dark forces? Sure it is. Do they have more powerful tools to analyze this information? Sure - they do. Can they and will they misuse this information - sure they can, they will and they do.

Recently FBI issued a parental warning to monitor the internet usage of their children. The reason - many children are lured by sex workers and mafia using the social networking sites.

Recently a young boy killed himself because he was cyber-bullied and he couldn't take it any more.

This is Web2.0. It only gets worse with Web3.0.

Semantic Modeling, psychological profiling, spheare pshing
Earlier, I stated that "understanding" and "extracting" meaning out of information can now be done by machines. Let me explain this in layman terms. Most of you may have read a book called 'Blink'. In the first chapter of this book, the author explains how a trained psychologist can listen to five minutes of everyday conversation between a couple, and figure out very accurately whether marriage lasts, and if does - for how long.

When we meet a person, spend time with them - we generally know something about that person. All of us do some extent of 'psychological profiling'. This is a normal human trait. We do it for our self defense, to know more about the other person, to make friends with them, to understand how to conduct with other person, to gain some favors and so on.

But, it is in general impossible for a human being to remember the entire conversation the other person had with everyone in the world, and analyze it later at leisure. Most of our 'estimate' of a person happens in a split-second, it is - in general - not a planned, conscious activity. Suppose, all our talk along with a lot of information about ourselves is available electronically - and suppose machines are capable of 'analyzing' this information and draw up a psychological profile of this person - isn't that a very powerful information to possess?

Recently, Amit Seth and others published a very interesting paper called "Semantic Analysis of Social Networks - Experiences in addressing the problem of conflict of interest detection". Interestingly, the project was sponsored by ARPA - which is the US defense department's advanced research agency. The researchers used information publicly available about authors of scientific papers from two different sources, and using the semantic modeling techniques they developed, they discovered that many authors - to put it in layman terms - cheat to get their papers published. Suppose there are two authors called X and Y. Let's say that X and Y worked together in some research institution in the past and published a joint paper together. Later, X left the institution and joined some other organization and Y continues to work for the same institution. Now, when X sends a paper for publication, Y acts as the reviewer and accepts the paper for publication. This is called conflict of interest - because people who know each other are not supposed to be reviewers of each other.

Earlier, finding out such information was practically impossible. How does one journal know that Y and X know each other? But, with social network sites, and professional network sites - such information can be very easily detected by computers today. Now, this is a harmless, useful research paper. But, what are its potential applications?

A super Google can be easily developed which can 'extract' very meaningful and powerful information about people and their lives. If there was ever any complaint was filed against you - that information can be 'fed' into such an application. It can easily discover what is your financial net worth by analyzing your online purchase history, or your online bank account transactions. It can easily generate a psychological profile of yourself from your online information. It can easily draw inferences about a person of various kinds - whether a person is anti-government, he/she is likely to be a political activist, whether they can be easily be made to submit themselves to some kind of brain washing and so on.

If the Internet adopts the RDF standard which is pending - developing such super 'Google' becomes a relatively easy task. Till now, such programs required very expensive and specialized computing models. But, with all 'information' and 'meaning' available simply as part of the attribute of an object - we no longer require any complex computational models to generate inferences, it can be done by any normal database query. This means that anyone who has access to the data - everyone - can easily develop applications to 'discover' information about people.

Implications of RDF and Semantic Databases

After the 9/11, there was a massive initiatives by the US government and several other governments to collect massive amounts of centralized data about people and store it in one central place. The Bush Administration started an initiative called Total Information Awareness - which was later scrapped because of the unprecedented public outcry. This TIA act gave absolute powers to the security agenecies to collect any information about anyone. Though this 'act' was cancelled - it is now part of the defence department's unclassified projects. The name changed, but the work continues.

With RDF, there is no need to develop any centralized databases anymore. Because, every object on the Web is capable of answering some interesting questions about itself - one can simply get the information from multiple sites and run any query on a distributed database.

If this sounds too technical - here is an example:
These days, a typical Internet usage by an average user can be summarized as:
1. An Email-Id with webbased email services like Google, Yahoo etc.
2. Probably and official or corporate Email-Id that is used for business and professional communication
3. Several Login-Ids and some activity at several e-commerce sites - like Amazon.Com, e-Bay
4. Some history of online purchases using credit cards - airline tickets, music and books, groceries
5. A login-id with at least one social networking site like Orkut, Linked-IN, Myspaces.com and so on
6. Probably a blog and a personal website.

Add to this, lot more online data that is created without our knowledge and permission - if you traveled in an Airplane - your travel history is recorded by that Airline company. If you visited some countries either as a tourist or on work - your entire travel history is recorded by many governments, travel companies and so on. If you booked your hotel reservations online - this history is recorded by various organizations. If you passed through various security zones at several places - your photographs and videos have been captured and stored.

With Semantic-Web, all this information - can be potentially linked together without any significant effort. There is no need to centralize all this various information sources into one centralized database anymore. This is the power of semantic databases. With the advances in multi-media technologies - searching through multi-media objects like image search, video search, voice search will also be possible quite soon. The technology has not yet reached this stage of evolution - but it will soon get there.

This basically means that we leave an enormous amount of electronic audit trail of ourselves, some of it willingly created by us, and most of it created without our knowledge.

How can such information be misused? One example is spear phishing. Here is an excerpt from an article published in Newsweek International called "A Dangerous Game of Phishing":

"Spear phishers gather information, usually on the Internet, about an individual, and then craft a personalized e-mail more likely to dupe the mark. According to the FBI, the personalization method has proved so profitable that a significant number of spear phishers, principally located outside the United States, began applying it to death-threat extortion e-mails for the first time last December. FBI spokeswoman Cathy Milhoan says the problem is "huge."

Here's how it works: A spear phisher collects information on an (often wealthy) individual, then writes a chilling e-mail. The sender, posing as a hit man, offers to spare the recipient in exchange for a large sum of money. If the ploy doesn't work, the target receives a second e-mail, purportedly from the police, explaining that his or her name and address were found on a recently arrested murder suspect. "The victim gets scared, gets paranoid, he gets a lot of things," says Alan Paller, a cybercrime expert with the Bethesda, Maryland, SANS Institute who has testified before the U.S. Congress on the matter. The target provides personal details—including financial data—to aid the investigation.

Traditional extortion often involves tailing targets and staking out their homes to obtain the particulars—such as the appearance of a victim's daughter—that render threats credible. Today much of that information is easily gleaned from the 'Net. Dan Vogel, an Edmond, Oklahoma, former FBI profiler, says social-networking Web sites such as MySpace are "fueling" the trend."

In an article entitled "Governments research to track online networking", Christopher Dela Cruz and Megan Carr say the following:

"The Department of Homeland Security is paying Rutgers $3 million to oversee development of computing methods that could monitor suspicious social networks and opinions found in news stories, Web blogs and other Web information to identify indicators of potential terrorist activity.

The software and algorithms could rapidly detect social networks among groups by identifying who is talking to whom on public blogs and message boards, researchers said. Computers could ideally pick out entities trying to conceal themselves under different aliases.
It would also be able to sift through massive amounts of text and decipher opinions - such as anti-American sentiment - that would otherwise be difficult to do manually.

The program is designed to sift rapidly through huge amounts of data. It has also been described as a sort of "Super Google" researchers such as Eduard Hovy at The University of Southern California, to explain the scope and quickness of the technology."

It is definitely scary. I don't even know what is possible with Web4.0.

You bought your computer, but you don't own it any more

If the above article sounds like one big conspiracy theory written by a paranoid individual - think twice. How many different entities are vying to control your computer? You bought your computer - but do you really "own" it?

In the words of internet security expert, Bruce Schneier - there's a battle raging on your computer right now -- one that pits you against worms and viruses, Trojans, spyware, automatic update features and digital rights management technologies. It's the battle to determine who owns your computer. Malicious software like spyware, malware, trojans, worms and viruses are basically agents that somehow slip into your computer and "serve" the interests of someone else - they steal your passwords, make fraudulent bank transactions, collect data about yourself and your internet usage, send spam emails using your email-id and so on. Estimates area that there are some millions of computers that are part of this "bot" network.

But, now things are not that simple. There are attempts by "legitimate" software programs - by many different media and other companies that determine what you can and cannot do with your computer. Recently, Sony released a "rootkit" program that attempted to block the user from doing something with Sony's music that Sony "considered" illegitimate. There are automatic update programs and digital rights management software that control what you can do and cannot do. They collect a lot of data about your machine and what you do with the machine that you probably do not even know. Here are some examples:

• Entertainment software: In October 2005, Sony had distributed a rootkit with several music CDs. This rootkit installed itself without the knowledge of the user. It simply installs itself when you play the music CD. Its purpose was to prevent people from doing things with the music that Sony didn't approve of: It was a DRM system. It would have been a "virus" if it were installed by a hacker. But Sony believed that it had legitimate reasons for wanting to own its customers’ machines. Interestingly, most commercial antivirus programs did not detect Sony's rootkit - simply because Sony asked them not to. Who are they serving? You? Or someone else?
• Application software: Internet Explorer users might have expected the program to incorporate easy-to-use cookie handling and pop-up blockers. After all, other browsers do, and users have found them useful in defending against internet annoyances. But Microsoft isn't just selling software to you; it sells internet advertising as well. It isn't in the company's best interest to offer users features that would adversely affect its business partners.
• Spyware: Spyware is nothing but someone else trying to own your computer. These programs eavesdrop on your behavior and report back to their real owners -- sometimes without your knowledge or consent -- about your behavior.
• Internet security: It recently came out that the firewall in Microsoft Vista will ship with half its protections turned off. Microsoft claims that large enterprise users demanded this default configuration, but that makes no sense. It's far more likely that Microsoft just doesn't want adware -- and DRM spyware -- blocked by default.
• Update: Automatic update features are another way software companies try to own your computer. While they can be useful for improving security, they also require you to trust your software vendor not to disable your computer for nonpayment, breach of contract or other presumed infractions.

So, I am not so paranoid after all - and most of it is not a conspiracy theory. We may not recognize it, but the world is fast becoming a gaint airport security area - where every single movement is recorded and everyone is suspected.

The Net and the Web - these two words always had a ring of "conspiracy and control" around them, may be they are not a coincidence after all.

Conclusion
What can we do? We cannot turn the clock back. Paraphrasing Oscar Wilde - we cannot live with the Internet and we cannot live without it. There is lot of debate by different people, groups, experts on whether the Internet should be controlled. My own view is it cannot be controlled. There is no legislation governing what can happen on Internet, and there is no possibility that such a legislation would ever emerge. As individuals - historically and culturally - we always thought that our governments and societies will take care of our security. With the Internet, that is not possible. We created the Internet to empower the individual and it did that exceedingly well. There are always individuals and groups who intend to misuse that power. It cannot be stopped.

There is very little public awareness about the dangers of power-shift into individual hands. Hundreds of years ago the power and control rested in the hands of the kings and aristocrats who supposedly took care of their subjects. With the Industrial revolution, the power and control shifted into the hands of the corporations, businesses and media companies. Now, the power is shifting into the hands of individuals. On one hand, this is a positive development and on the other hand - this has implications on our lives that we haven't even begun to assimilate.

Most of these privacy and security articles - sound like conspiracy theories - so people do not believe in them. It is very difficult to get factual support - because by definition it is a dark-side, underworld activity - it is impossible to get scientific data. Therefore, we do not see any 'publications' about these issues in any reputed scientific journals. So, a debate between the two sides - those who argue that the Internet should be left as a self regulating mechanism and those who argue for some kind of legislation and privacy control is not possible.

There is another factors that worries me personally. As human beings - we are genetically engineered to absorb information and train ourselves at a particular pace and speed. It takes several generations for us to 'internalize' certain information. For example, we all know how to take care of our security in a society - it has been internalized into us for hundreds of years. But, the Internet evolution is too fast for us to internalize. How do you retrain your "sub-conscious" in a matter of a month? When you login to your social networking site - in order to be careful of what information you put in there - you need to develop "automatic" safe-guards, a king of sub-conscious filter similar to the one that prompts you check whether you locked your door or not. This takes a lot of time - and we don't get that kind of time.

But, we have to retrain ourselves and conscious structures to take care of ourselves. We first need to educate ourselves that in this age - information is valuable and anything that has value carries certain risk with it. We have to have some idea of this risk and learn to act accordingly. I do not see any other possibility. My advise - do not bring your living rooms and bedrooms onto the Internet. Do not expose your friends, colleagues if there is no "specific" value you obtain in doing so. Be careful of expressing yourself and your opinions on the Internet, on blogs and other places. More importantly - it is our children who are at an immense risk, and we have to educate them about the risks of Internet - as we educate them about safe sex. Sex by itself is not dangerous - but, there are certain limits and a moral code that has to be followed. Same is the case with the Internet.


References and Further Reading:
Bruce Schneier’s Security Blog: Bruce Schneier is one of the world's most respected security and privacy expert. He writes some very insightful articles about these issues in his blog. A must read for everyone.

Kingsley Dennis' blog: Kingsley is a post doctorial research scholar at University of Lancaster, UK. His blog is one of the most interesting blogs on Internet, sociology, security and privacy issues, mobilities, mind-control and many other topics.

Global Guerrillas: Very interesting site and insightful articles.

Information War: The name says it all.

Spy Blog: Don't miss this one!!


Information warfare: U.S. Department of Defense's policy documents on Information Warfare.


Surveillance Issues: We live in a world of surveillance. It is amazing how many such technologies and devices surround us.

Electronic Frontier Foundation
: An organization that works on several issues related to privacy, security and other matters. A must visit for all bloggers.