Lim Boon Chuan – Singapore.TO

Blog of Lim Boon Chuan

Research Proposal

August21

Research Proposal : Spam – A Correlation Approach

Aims of the Research

The research aims to consolidate the various approaches to spams of various kinds to provide a holistic understanding of spams with a view towards prevention and reduction.

Background to the Research

The traditional spam or Unsolicited Commercial Emails continue to plague Internet users since the DEC Spam of 1978. Spam accounts for 14.5 billion messages globally per day. In other words, spam makes up 45% of all emails. Some research companies estimate that spam email makes up an even greater portion of global emails, some 73% in fact [1]. Notwithstanding this, web spam is also on an increase given that usage of the search engines are the most popular online activities after Email [2]. The popularity of search engines coupled with the fact that blogs are favorites of search engines for Search Engine Optimization purposes made blog reviews part of the arsenal of a spammer. Emerging Internet usages such as micro blogging in the likes of Twitter are also not spared. eMarketer estimates there were roughly 6 million Twitter users in the US in 2008, or 3.8% of Internet users with projections that the number of Twitter users will jump to 18.1 million in 2010, representing 10.8% of Internet users [3].

As The Vancouver Sun points out, when Twitter was at the leading edge and only had the attention of digirati, it fell below the radar of those who troll the World Wide Web peddling Viagra and bogus get-rich-quick schemes. It was also slim pickings for criminals looking to steal identities and seize control of computers. But that changed dramatically when celebrities like Oprah shone a spotlight on the micro-blogging service and fans flocked to it. That critical mass has proven an irresistible lure for the unsavoury online crowd [4].

Previous research tends to concentrate more on individual aspects of the spam. Be it from a technical perspective or from a socio-economic perspective. The correlation aspects of the spams in the different settings was not investigated. The lack of a coordinated research into the various types of spams are especially glaring given that the emergence of spams in the new technologies and usage such as Twitter as well as other social networking groups threaten the very existence of such new services as well as continuing to plague users of email and search engines in growing proportion. Exploring the similarities and consolidating the various research findings to fight spam as a whole is essential for the continued relevance of the Internet. The decline of Digg is a good example of how spammers manipulate a new technology for themselves and the effect it has on the Digg, a social bookmarking provider [5]. The Vancouver Times did raise the possibility of Twitter falling into the hands of the spammers in their report.

This research builds upon the foundation of the previous studies into the various types of spams and include ways of conciliation and consolidation of the various theories to look into the correlation of the various spams through semantic algorithms. There is copious research on Email and Web spams and less so on the new technologies such as Microblog as well as social networking spams. It is therefore necessary to seek the relationship between the spams to enhance the understanding and the ability to combat them.

Objectives of the Research

1. To explore the various theories and research put forth.

2. To investigate into the correlation of the various research targets using semantic algorithms to determine their common traits.

3. To explore the feasibility of using the research results of the various spam techniques to be used interchangeable for the optimum results.

4.   To investigate the use of ANN – Artificial Intelligence algorithms so as to enable learning of new services and associated spams

5. To explore the possibility of a consolidated spam data sets and techniques for the various spams in question.

Literature Review

The review will be targeting the existing research on spams and anti spams for all the platforms – emails, web, blogs, microblogs and social networking. It will identify the approaches used and the results obtained and seek to investigate the correlation between these approaches.

Methodology

The study uses prior research information and results and attempts to

  1. Investigate their results and targets.

  2. Analyze the investigations to seek a correlation between the various researches through semantic algorithms.

  3. Investigate the correlation function that results and its effectiveness.

  4. To derive an ANN algorithm to combat spams that evolve due to new services

  5. Analyze the data sets used and attempts to standardize a data set for such researches in the future.

Research schedule

Year One : Investigate into the prior researches and their results in a systematic manner for further analysis.

Year Two : Formulate the correlation given the researches done earlier and determine the results.

Year Three : Investigate upon the creation of a universal data set for spam testing and research.

Year Four : Derivation of ANN algorithms to enable learning and prevention of spams in new services.

Year Five : Wrap up of Research and Write up.

Dissemination of Results

1. Articles on the various journals

2. Feedback to the various anti spam and related organizations and research institutes

3. Contributions at conferences and presentations

4. Production of comprehensive research document in book form

References

  1. Spamlaws “Spam Statistics and Facts” (http://www.spamlaws.com/spam-stats.html). Accessed 05 August 2009.

  2. Pew Internet and American Life Project “Daily Internet Activities” (http://www.pewinternet.org/Static-Pages/Trend-Data/Daily-Internet-Activities-20002009.aspx). Accessed 05 August 2009.

  3. eMarketer “Twitter Tally” (http://www.emarketer.com/Article.aspx?R=1007059). Dated 28 Apr 2009. Accessed 05 August 2009.

  4. The Vancouver Sun “Spammers and scammers surface on Twitter” (http://www.vancouversun.com/news/Spammers+scammers+surface+Twitter/1793966/story.html). Dated 14 July 2009. Accessed 05 August 2009.

  5. Kristina Lerman “User Participation in Social Media: Digg Study” (http://arxiv.org/PS_cache/arxiv/pdf/0708/0708.2414v1.pdf). Accessed 05 August 2009.

Co-Author

Those interested in co-authoring the research please reach me at boonchuan@singapore.to.   I am interested in all skills as spam itself entrenched deeply into society, we are not talking about just the anti-spam technologies for which books have been written about.   We aren’t only interested in the sociological aspects of spams.  But we need to look at all perspectives in order to have an idea of what spam really is, to have a defining characteristic of spam and from there, we can work out the correlation of the various researches that had been done before us and strive towards looking at the relationship among the various types of spams and in doing so, look for the key in the prevention of spams not just for current services but for future services to come.