Thursday, December 23, 2010

The Hammer and the Anvil

“Hephaestus created Hermes winged helment and sandals, the Aegis breastplate, Aphrodite’s famed girdle and Agamemmnon’s staff of office.” ~ IIlad II Homer

The software model used predicates the architecture of the application, its use and design and generally how easily the software may be maintained and updated through its life-cycle. There are many formal software models that have been developed; the most popular is the Waterfall, there is also Agile, Big Design, Chaos, Iterative, Rapid Applciation Development, Bhoem Spiral, V-Model’s. [i] 
There are as many methodologies as there are models to software development, these include Agile, Clean room, Iterative, RAD, RUP, Spiral, Waterfall, XP, Lean, Scrum, V-Model and TDD. [ii]

Each of these models have their respective strength’s and weakness; the analysis of all of them would be far too comprehensive for discussion; AJAX being developed by developers trained to use these models however is closer as a methodology to XP, AGILE, RUP and the Waterfall models. With respect to Rich internet applications, AJAX is more of an interface standard than anything else.

As a specification AJAX is great for users, good for servers and excellent for computers. It’s great for users because it removes the tedious aspects of user input within forms and transitions within websites. It’s great for servers because it limits the interface with the client to only what is needed when it is needed. IT’s excellent for computers because it distributes the processing requirements actively from the server to the client and can be used to reduce the cost of running the servers.

Good software design is a very subjective statement; ask a hundred developers what “Good software design is” and you will receive a hundred answers each equally cryptic and unique. Upon further study themes would begin appear. The Systems development life cycle or SDLC for short is the fruits of such work.[iii] The goals of any systems produced using Secure SDLC and SDLC are to produce high quality systems that are maintainable inexpensively, operate in a secure fashion and may be enhanced cost effectively.[iv]

AJAX is good for web-applications, it’s great for instances where applications are running within a client browser; that’s its prime function; and its prime limitation. Software applications run on plethora devices; not all of which are connected to the internet. I do not think we will ever see an implementation of AJAX based software on devices that are used in the automotive industry. AJAX would not be implemented in any systems that are using the MOST protocol to communicate.[v]
AJAX is useful for mobile applications on smart phones which are now the largest emerging device market on the planet; even though there are a number of challenges for smart phones as a platform since the client side processing available is quite limited[vi]; As the Smartphone as a platform increases its computing capacity AJAX will become more common. It’s also great for any web based application which is another large chuck of the application industry.

AJAX will definitely aid in the forward evolution of web-development on many fronts as it’s use simplifies form interaction for both the end-user and the server. I see AJAX as a standard requirement for web application development.

Just as the Hammer, anvil and clamps were Hephaestus tools used to create articles and items of legendary beauty, AJAX is standard that is used to create beautiful and elegant web-tools for people.


References

[i] N.A. (Wikimedia, 2010) Software Development Process: Software Development Models [Online] World Wide Web, Available From: http://en.wikipedia.org/wiki/Software_development_process#Software_Development_Models (Accessed on December 23rd 2010)
[ii] N.A. (Wikimedia, 2010) Software Development Process: Software Development Models [Online] World Wide Web, Available From: http://en.wikipedia.org/wiki/Software_development_process#Software_Development_Models (Accessed on December 23rd 2010)
[iii] N.A. (Office of Information Systems, February 17 2005) Selecting a development Approach [Online] PDF Document, Available from http://www.cms.gov/SystemLifecycleFramework/Downloads/SelectingDevelopmentApproach.pdf (Accessed on December 23rd 2010)
[iv] Howe, Denis (FOLDOC, December 24th 2000) Systems Development Life Cycle (Online) Available from: http://foldoc.org/Systems+Development+Life+Cycle (Accessed December 23rd 2010)
[v] MOST Consortium (MOST, 2010) Introduction to the MOST Protocol [Online] World Wide Web, Available from: http://www.mostcooperation.com/technology/introduction/index.html (Accessed on December 23rd 2010)
[vi] N.A. (StackOverflow, N.D.) AJAX Support in Smart Phones [Online] World Wide Web, Available from: http://stackoverflow.com/questions/849850/ajax-support-in-smart-phones (Accessed on December 23rd 2010)

Friday, December 17, 2010

The Changing nature of web-development

Web services are defined as any service or function delivered via hypertext transfer protocol (HTTP) offered via the Internet and executed remotelyi.
Service Oriented Architecture is defined as a flexible set of design principles used during the phases of and integration within the software development life-cycle of which XML and JSON are commonly used but not required for service coupling.ii
The current state of web-development for most popular web-applications is based on “Foundational Technologies”; a foundational technology is defined as any technology that becomes a base requirement to enable any other technology. Real world examples include how the current state of the multi-billion dollar produce and food industry rely solely upon widespread cheap and available refrigeration and the national and international power grids to maintain product stocks both in warehouses and during shipping to the grocers. If the North American power grid fails; all grocers have blackout sales of any and all frozen produce.
With respect to the Internet; foundational technologies include systems such as DNS, multi-homed peering sites, BGP router gateway protocols, widespread use of TCP/IP and most importantly standardized application programming interfaces such as those created by web-applications to facilitate the exchange of information from system to system using either SOAP, XMLRPC or JASON using standardized methods as defined by the W3C's Web 2.0 conference.iii The basis of for web services are defined as the web as a “platform” where regardless of client operating system or device the web and the browser deliver the desired computing application functionality. This is often referred to as the “Cloud” or “Cloud Computing”.
Essentially web development now has many vertically integrated services that have introduced dependencies on the platform that would have previously been required to be developed and supported internally within the local application and it's framework. This means the development of databases of information to be accessed by the application locally have been moved to the web or removed from the application completely.
The current state of web development is dominated by the use of the cloud as the platform; Microsoft even has commercials using public marketing designed to popularize the term “To the Cloud”, as if it was one of their inventions. Wal-mart's store locater relies upon the use of Google Maps to add relevant and easy to use Geo-location specific information to their clients.
Web services such as Amazon's EC2 cluster allow the direct purchase of computing power delivered through the use of an API designed by them with any choice of transports; The popularity of folding@home, Seti@home and other distributed supercomputing environments are demonstrating how the “Cloud as a platform” is the most powerful super computer. Currently not just web-applications but local applications are now relying upon service oriented architecture and distributed and dynamic capacity to accomplish both complex and amazing science.
The three critical factors in the production of anything as defined by neoclassical economics are “Labor, Capital and Land”iv, the goal of web-development is to produce a usable web-application to serve a dynamic and wide range of people's needs with information. Often this production is based on the two requirements of Labor and Capitol, the “Land” is virtually defined as hosted or rented space within the “Cloud”.
The “cloud as a platform” has reduced the capitol and labor requirements such that a small team of developers can now create entire applications within months. They may rely on third parties for access, authorization, authentication, service maintenance, information transfer and hosting; all of these third parties may only specialize in one particular service as a critical or dependant function.
Currently any relevant web-site as dictated by the popularity of use via ranking services such as “Alexia” have demonstrated that not only must a web-application function within itself, it must depend on external services. Google uses 3rd party maps and satellite imagery to generate Google maps; Microsoft relies on 3rd parties for service delivery; even Apple's iTunes could not function without Pay-pal which was originally developed as a platform for eBay.
Not only does current web-development require the use of web-services, but the business success of any web-application requires that it integrate well with other Web 2.0 sites via these now standardized interfaces. The future of web development is anyones guess as to weather or not JASON, JAVA, HTML, PHP, Javascript, XML or HTML5 become standards; the only certainty is that most platforms will inter-operate to deliver a service to a client regardless of operating system type or browser and that this operation will be seamless. Not because it's cheap, nor because it's simple but because we the users demand it.

References

iRichardson, Lenord, Ruby, Sam; (O'reilly Media, 2007) Restful Web Services P.299 ISBN: 978059652960
iiBell, Micheal (Wiley, February 2008) Service Analysis, Design and Architecture P 2 ISBN: 9780470141113
iiiSharma, Prashant (Techpluto, November 28 2008) Core Characteristics of Web 2.0 Services [Online] World Wide Web, Available from: http://www.techpluto.com/web-20-services/ (Accessed on December 16th 2010)
ivSamuleson, Paul A. Nordhaus, William D; (McGraw Hill, Yale University, 2010) Economics Glossary of Terms ISBN: 9780073511290

Sunday, December 5, 2010

The information warehouse

A data warehouse is defined as the collective content of a collection of databases; usually normalized and sanitized in the process of moving out of an operational database into the data warehouse. It's primary use is for reporting and business intelligence.i

Data-warehousing on the web, or using a cloud to host the data-warehouse presents a number of issues and risks we have listed them in no specific order;

Privacy
Within the United States, Canada and most G8 countries there exists legislation that is enacted to protect the privacy of the citizens of said country. In Canada it's refereed to as the Privacy Act. With respect to digital information in Canada there exists a separate act refereed to as PIPEDAii. With respect to the operations and storage of personally identifiable information, if a company does not take “due care” to protect said information they may end up getting sued by the Crown in Canada for a gross violation of the act; within industry this is known as privacy associated risk. Gross violations include making any of that personally identifiable information available on the Internet, with or without the owners consent.

Security
There are entire volumes written on the appropriate methods for security to be used in conjunction with a data-warehouse; these include concepts such as Logical, Phyiscal and Technical access controls; Formalized data security models, such as Biba, Bell-LaPaullda, and others as mentioned in various U.S. DoD, NIST, CMMI and ISO standards; procedural controls for access including Separation of Duties, Two Factor authentication, biometrics, and various other methods and procedures too numerous to list. All of these standards, guidelines, and procedural concepts are designed to achieve one goal, to ensure that the risk of fraud internal and external sources is reduced to an acceptable level.iii; thus the business may trust the data within the warehouse to be sound.

Availability
Assuming that an organization utilized Software as a Service, or a 3rd party provider for data-warehousing; with the Internet as an intermediary, having a “Service Level Agreement” and “Business Continuity Plan” in place with the provider and including the “Right to independent Security Audits” are critical in nature to the business that conducts the outsourcing. Both the ISC^2 and the ISACA cite the “Right to independent security Audits” as a critical factor in conjunction with High Level Sponsorship; ie; the board of directors signs off on the risks associated with hosting a companies data-warehouse to ensure that various mitigating measures are met to accommodate any and all potential business impacts including the saftey of employees and clients as well as to ensure the business itself is not at risk.iv v

Non-repudiation
The other major issue is the implementation of technical controls and validation and sanitation methods and processes; With either Normalization or Dimensional approaches to be usedvi; the business itself must implement measures to ensure that security and privacy regulations are met; these include the use of encryption of a military grade or greater to assure that client data in transit is protected from unwanted disclosure and that the integrity of the Data warehouse is maintained.

Regulations
One of the major risks to data-warehousing with regards to businesses are the creation of new regulations; Sarbanes Oxley was created to mitigate fraud from an organizational balance sheet, GLBA exists to ensure that Bank's do not engage in reckless behavior with deposits; HIPPA exists to ensure that patient data is not exposed during transit and that insurers and health providers adhere to the privacy requirements of both the public and the letter of the law.

Each of these regulations was created by the American congress to mitigate some major legal issue that arose from industry recklessness; these include the fiduciary and privacy scandals of Enron, MCI world-com, Nortel, TimeWarner, America Online, Investors group, Bank of America and others. The ISACA and ISC^2 stipulate that global policies with local versions that meet any local regulations be enacted in any enterprise that operates on a global scale; however the ISACA state that the data owner must agree to any data transit policies upon submission of any data.

The major issue with regulations and their respective risk to a data warehouse is that the acceptable use and client notifications must include what will occur to the data that is disclosed by the end user and they must also comply to all regulations for all countries that the businesses operates in.

This creates a compound issue where a company collects client information in north America and stores it in China or India for more efficient processing. The major problem is that China and India do not have stellar records when it comes to upholding American privacy legislation. Therefore it is up to the Business to ensure that American legislation and requirements are met within the operations of the third party within it's country of residence. Although this is difficult it requires both strategic oversight and governance form the parent organization.

The nature of organizational change is that global operations will maintain an executive board and security steering committee whom meet to determine the appropriate behaviors to mitigate the above risks, and to meet audit and legal and policy requirements. The person responsible for these operations in most organizations is usually the CIO, CISO or COO. The rise and prominence of information and it's value in the Internet age has created many new and complicated issues. To navigate these waters with clarity adds both to the business value and competitive nature.

How these new requirements affect a web based data-warehouse are that any 3rd party provider of data-warehousing services must meet the regulations and legal requirements for privacy of the country of origin and country of residence.

References
in.a. (Wikimedia 2010) Data Warehouse [Online] World Wide Web, Available from: http://en.wikipedia.org/wiki/Data_warehouse (Accessed on December 5th 2010)
iiGoC (Canadian Parlement, November 14th 2010) Personal Information and Electronic Documents Act [Online] World Wide Web, Available from: http://laws.justice.gc.ca/eng/P-8.6/page-1.html#anchorbo-ga:l_1 (Accessed on December 6th 2010)
iiiHarold F. Tipton (CRC Press, 2010) Offical ISC Guide to the CISSP CBK 2nd ed.
ivGoC (Canadian Parlement, November 14th 2010) Personal Information and Electronic Documents Act [Online] World Wide Web, Available from: http://laws.justice.gc.ca/eng/P-8.6/page-1.html#anchorbo-ga:l_1 (Accessed on December 6th 2010)
vISACA (ISACA, 2009) CISM Review Manual 2010 ISBN: 978-1-60420-086-7
viE.F. Codd (ACM, 1970) Communiations of the ACM “A relational Model of data for large shared banks [Online] PDF Document, Available from: http://portal.acm.org/citation.cfm?doid=362384.362685 (Accessed on December 6th 2010)

The forest and the trees



According to the ISC^2 under the information security governance and risk management section of the common body of knowledge there exist a number of rules regarding the ISC^2 code of ethicsi;

The no free lunch rule” - “Assume that all information and property belongs to someone.”

ComScore states that e-commerce spending neared 34 billion dollars in the first quarter of 2010ii.

Of the 256,000,000 websites on-lineiii; as of 2007 there were 20,000,000 using php this is of 102,400,000 total domains at the timeiv. If we assume these trends have remained constant then we may extrapolate that around 35% to 50% of all web-sites on-line use php scriptingv.

The TIOBE index for 2010 states that PHP falls just behind C/C++, and JAVA in popularityvi.

Most companies which engage programmers to develop applications for them retain intellectual property rightsvii. These rights and applications are the tools used to extract value from eCommerce.

Web development by it's very nature is open; the main issue that business managers have with regards to web facing presence is that they expose the company to a degree of risk; these include the risk of theft of IPviii. Google had it's Intellectual property removed by force and order of the chinese government due to a politicians disdain for his on line presence. Since Google net worth is approaching 5 billion, we can see that this theft of IP would be the equivalent of stealing a bakers oven, or a delivery companies planes, trains and vans.

Legal considerations aside; the future of web development is open, but in a validated escaped vetted and verified manner. As applications become more dependent on web-based technologies; such as the games in facebook, or how salesforce.com can pull contact information from linked in; the sites that work with one another use the number of users as a method to apply a metric from which to derive economic value.

People often quote that facebook is worth x billion of dollars based on the data the web-site retains; however real asset valuation is usually based on revenue plus operations and management plus cash in hand and holdings. Far too often do we as investors assign value to worthless ideas. Facebook is based on enabling a distributed community of people to tag meta data within digital photos. This idea is patented formally and coded on the platform that is facebook.

The future of web development will have greater interconnectivity, however these levels will be offset by the needs for the enforcement of privacy legislation and both local and non-local security interests.

The nature of how future web-sites will communicate may involve active security testing as part of the web-sites operations and api development; DNS based secure validation may also be required for all domains, further to this we will also see a rise in privacy violations made by companies since they are often neither enforced nor punished legally for doing so.

I see a forest of many brilliant trees with fireproof bark whose branches only cover certain valuable areas; the mycelium of this forest is ironclad and paid for.

Future web-sites will be service level based connections that are agreed upon by the various data holders; such as facebook, google and the like, and they will probably be fortified by in-line detection of any and all valid code and transactions, mired in legal requirements and legislation and audited by many security personnel.

As the internet grows and adoption continues to rise in global adoption; the future of website development is very open, the nature of the back end of websites is becoming far more closed and restricted. This is to protect the investment of both human and real capitol in the development of these most brilliant tools.


iHarold F. Tipton (CRC Press, 2010) Offical ISC Guide to the CISSP CBK 2nd ed. P.495
iiN.A. (comScore, Marketing Charts) Q1 E-commerce spending rises 10% [Online] World Wide Web, Available from: http://www.marketingcharts.com/direct/q1-e-commerce-spending-rises-10-12982/?utm_campaign=rssfeed&utm_source=mc&utm_medium=textlink (Accessed on December 5th 2010)
iiiN.A. (Netcraft ) Web Server Survey [Online] World Wide Web, Available from: http://news.netcraft.com/archives/category/web-server-survey/ (Accessed on December 5th 2010)
ivN.A. (php.net) Usage Stats [Online] World Wide Web, Available from: http://php.net/usage.php (Accessed on December 5th 2010)
vSeguy, Damien (nexen.net, 2008) All statistics related to PHP [Online] World Wide Web, Available from: http://www.nexen.net/chiffres_cles/phpversion/ (Accessed on December 5th 2010)
viN.A. (TIOBE Software) TIOBE Programming Index for November 2010
viiNicholson, Andrew (FindLaw, Austrialia) Without Employment Contracts employeers risk losing IP [Online] World Wide Web, Available from: http://www.findlaw.com.au/articles/2269/without-employment-contracts---employers-risk-losi.aspx (Accessed on December 5th 2010)
viiiThomsan, Ian (V3.co.uk, November 29th 2010) Wikileaks Cable showed that China politburo oreded Google Hack [Online] World Wide Web, Available from: http://www.v3.co.uk/v3/news/2273507/wikileaks-google-china-cables (Accessed on December 5th 2010)

Wednesday, November 17, 2010

Objects and the Internet



Object Oriented Programming has many benefits, the primary of which is object re-use.i Object reuse is the primary reason application programming languages such as Java and C/C++ dominate the web and platform development industry. Object reuse allows the evolution of data-structures and forms in a manner as to reduce the overall time required and increase the reliability of a given application whilst compartmentalizing and simplifying the development process, this in turn reduces development costs allowing businesses to achieve quantifiable results faster.

The Document Object Model (DOM)ii, is a language neutral interface that allows Javascript, XHTML and DHTML to function in a uniform manner regardless of platform or hosting infrastructure; they are expected to play well together anywhere.

The great possibilities for Java-script objects are those that have been developed and used by companies such as Amazon, they awarded a famous patent for “1-click” purchases.iii This 1-click patent and process relies heavily on OOP based ideas such as data-structures using javascript and server side objects. The 1-click method is also a very competitive trade secret currently licensed to Apple by amazon for use in iTunes.

The possibilities for object use within XHTML, XML and CSS, using the DOM and DHTML as standards are endless; AJAX automates the functions of form filling, object re-use and most importantly browser object and content manipulationiv. Thus the user experience may be improved and greater efficiencies may be realized by reducing the number of keystrokes required to make an on-line purchase.

Newer languages and platforms such as Ruby on Rails utilize AJAX to reduce the required time to deliver a web application from a few weeks to a few minutes.v The best example of this is google's Maps, this web-application is based soley on AJAX and Object-oriented concepts implemented in the browser accessing data in the cloud.

The possibilities for Objects on the Internet are endless; they will always be used in the manner in which they were designed, and will replicate and resemble their respective grand parents from OOP based langugates such as C/C++, JAVA and others. The difference is that they will facilitate the decentralization of processing requirements and the adoption of the use of remote computing resources via the ever present web browser by acting as the agent of communications for distributed applications.


iManish Vachharajani; Neil Vachharajani; David I. August; (Princeton University, 2003) A Comparison of Reuse in Object-oriented Programming and Structural Modeling Systems [Online] PDF Document, Available from: iberty.princeton.edu/Publications/tech03-01_oop.pdf (Accessed on November 18th 2010)
iiN.A. (W3C, January 19th 2005) The Document Object Model [Online] World Wide Web, Available from: http://www.w3.org/DOM/ (Accessed on November 18th 2010)
iiiHutcheon, Steven; (Sidney Morning Herald, May 23rd 2006) Kiwi Actor vs Amazon.com [Online] World Wide Web, Available from: http://www.smh.com.au/news/technology/kiwi-actor-v-amazoncom/2006/05/23/1148150224714.html (Accessed on November 18th 2010)
ivGarrett, Jesse James; (Adaptive Path, February 18th 2005) Ajax a new approach to web applications [Online] World Wide Web, Avaialble from: http://www.adaptivepath.com/ideas/essays/archives/000385.php (Accessed on November 18th 2010)
vHibbs, Curt; (O'REILLY; Onlamp.com, June 9th 2005) Ajax on Rails [Online] World Wide Web, Available from: http://onlamp.com/pub/a/onlamp/2005/06/09/rails_ajax.html (Accessed on November 18th 2010)

Tuesday, November 16, 2010

The new eCommerce paradigm


Traditional Worlkflows have been used by operations management and business for decades. Current standard business workflows have included e-mail, web queries, e-Commerce and transactions these exist in addition to traditional business methods such as process and operations management.[i]
Web based businesses use the INTERNET to conduct business, as such the only available method to  collect, verify and maintain customer relationships is using transaction based e-mail and web-forms to generate both direct sales and sales potentials, these functions form the basis of ERP, CRM, and ecommerce industries.

Data collection is the starting point for ecommerce based businesses, once collected data and information is gathered then this data and it’s use must be regulated to local laws and regulations. These include maintaining corporate policies that meet local regulations for any country of operations. The ISO series of policies (22307:2008 and 27002)[ii]


The webform will initiate the collection and authentication of information from both clients and in some cases suppliers and merchants. Amazon uses nothing but web forms to conduct its entire business with current net annual income valued at around 7.5 billion dollars.[iii]


With web forms businesses collect, manage and maintain any and all client and merchant related workflows; as such they have created a new paradigm for business, where information at a diminutive cost may be used to generate profit using traditional business on an exponential scale.
The new paradigm for business in the 21st century is based in the cloud and uses the internet as its engine.




References


[i] Cai, Ting; Gloor, Peter; Nog, Saurub; (Darmouth College, May 14th 1996) DartFlow a workflow management system on the web using Transposable agents [Online] PDF Document, Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.57.3354&rep=rep1&type=pdf (Accessed November 13th 2010)  
[ii] N.A. (Wikipedia) ISO/IEC 27000 Series [Online] World Wide Web, Available from: http://en.wikipedia.org/wiki/ISO/IEC_27000-series (Accessed November 13th 2010
[iii] N.A. (Amazon Inc. October 21 2010) Amazon Announces Third Quarter Earninings [Online] World Wide Web, Available from: http://phx.corporate-ir.net/phoenix.zhtml?c=97664&p=irol-newsArticle&ID=1485834&highlight= (Accessed November 13th 2010)

Friday, November 5, 2010

Pedictions 1.2


The impact of increased availability of all human knowledge on the internet is only begging to be felt by modern society now.  Ray Kurtzweil refers to the advance and paradigm shift of the internet containing all human knowledge as the “Singularity”; it is defined as a point in time in the near future where machines will have greater intelligence then their human masters.[i]

The primary example of H.G. Wells oracle from the time machine can be seen in Wikipedia directly; this and the Web 2.0 standard where the machines may now understand the content and data presented via HTML 4.2 and HTML 5.0 standards; this results in the ability for anyone with an internet connection to have access to the “Sum of all Knowledge”; the result of this is the “Law of Accelerating Returns” [ii], where technologies not previously related to the internet maintain rapid returns and breakthroughs with respect to innovation.

Some brief examples of rapid technological evolution include computers and computational power; however areas such as engineering are seeing exponential returns with technologies such as DMLS which leverage the availability of computational power and information.[iii] This foundational technological revolution will impact both industrial and non-industrial product development and manufacturing.

The most apparent shift is now occurring with print media; where the access to all print media online has spawned the “ebook” reader; where companies such as Apple have capitalized with products like the “iPad”; where your magazines and news papers are no longer physical but digital. I recently saw Willim Gibson lecture on how the “Publishing Industry” must adapt to the new paradigm of point of use printing where your book purchase would have the vendor produce the hard or soft copy on site with modern printing machines[iv]; not only as a means of necessity but also as a means to reduce the global carbon footprint; in addition to this he also stated that modern science fiction authors have far too many variables in society to draw arc’s to be used as storytelling tools. For those of you unfamiliar with his work, he coined the term “Cyberspace”.

The rate of technological change will increase; following this change the rate of differentiation with respect to complexity and innovation in all facets of human life will also increase. These rates are driven by Moore’s Law, the Law of Exponential Returns and the availability of all human knowledge on the internet. These facts will allow traditional fields of science to evolve in an exponential manner; Medicine may finally conquer ageing and all related diseases with various projects such as the Methuselah Foundation[v] and Immortality Institute[vi]; culture will become more granular and fragmented to meet the individual desire for unique consumption. Even now within academia the traditional humanities are being supplanted by modern neuro-psychiatry;

These shifts are occurring because the  internet facilitates communication amongst an otherwise separate group of individuals; since human’s are far better at problem solving as groups when a group of a large enough size has access to all known information on a given subject the rate of change for that given subject becomes proportional to the group size.

The other change is that privacy will be nonexistent with the use of search engine data and social networks; current impacts are being resolved in various courts regarding the public disclosure of personal information[vii].

The impacts of these changes to myself and my children will be that we will live longer, healthier lives with fewer resources and a smaller more conscious ecological and technological footprint, our work will be different in that we will specialize in fields that are considered non-traditional; we will consume our media in a self directed fashion where we consume only the media we are interested in and we will have less privacy then our grandparents.

References

[i] Kurtzweil, R.; Viking Adult; The Singularity is Near (September 22, 2005) ISBN: 0670033847
[ii] Kurtzweil, R.; Viking Adult; The Singularity is Near (September 22, 2005) ISBN: 0670033847
[iii] N.A.; 3T RPD Ltd. Direct Metal Laser Sintering [Online] Video (26 Oct 2009)
Available from: http://www.youtube.com/watch?v=BZLGLzyMKn4&feature=related (Accessed November 5th 2010)
[iv] Gibson, William, Young, Norah; CBC Full Interview of William Gibson on Zero History [Online] MP3 File, Available From: http://www.cbc.ca/spark/2010/10/full-interview-william-gibson-on-zero-history/ (Accessed on November 5th 2010)
[v] N.A. Methsualah Project  About the Methuselah Project [Online] World Wide Web, Available From: http://www.mfoundation.org/index.php?pagename=mj_about_mission (Accessed on November 5th 2010)
[vi] N.A. Immortality Institue About the Imortality Institute [Online] World Wide Web, Available From: http://www.imminst.org/about (Accessed on November 5th 2010)
[vii] Wright, A. Marie; Kakalik, S. John; ACM The erosion of privacy [Online] PDF Document, Available from: http://portal.acm.org/citation.cfm?id=270913.270922 (Accessed on November 5th 2010)

Thursday, November 4, 2010

Programming the Internet 1.1


The deep web and its impact in relation to search engines, academia and commercial sites can be summarized by the presence of new commercial products aimed directly at the information management sector.

The issue of the deep web is that the various commercial bodies that contribute to the internet at large also maintain large repositories of competitive knowledge or repositories that have no external links or connections; this may be defined as a collection of “Trade Secrets” and “Information”; the primary example would be the recipe for fries at McDonalds.  You’ll see the nutritional make up of fries on McDonalds web site; you’d be hard pressed to find the details of they’re fabrication anywhere. These commercial bodies will engage in public or semi-public communication with their commercial parterres via the internet; but just as all radio communications is not for public consumption[i], neither are all web-servers.

Michal K. Bergman wrote:

“Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not "see" or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.”[ii]

The only available method to search the “Deep Web” as stated by Bergman, is to conduct direct searches of non-linked sites utilizing cross referencing technology such as those cited in cyvellence’s studies[iii]; realistically the only true method that would not utilize educated guesses would involve the use of a network scanning utility such as nmap, nessus or metasploit to crawl the ARIN based address space for all values from 0.0.0.0 to 255.255.255.255, and index those findings in reference to the various engines used, or to leverage existing heat maps from CAIDA to establish the publically routable space as the primary scope[iv] for indexing. The major issue is that this approach has a number of legal barriers since in most countries “port scanning” constitutes a crime, and the technical challenges of creating a state full web-crawler for the entire space poses a real technical and fiduciary challenge.

The issue with search engines as stated in Bergman’s paper is “Quality” although there is a significant quantity of deep web sites; most of which are topic databases; in relation to content search engines are more concerned with quality and accuracy than quantity of search results. [v]

Academia are concerned primarily with quality; and accuracy of information and relevance of topics over quantity; as such various search engine providers are offering services that cater to the volume of knowledge contained within the traditional houses of excellence.[vi] These include publishers such as Prentice Hall, Springer, Deitel, IEEE, ACM, and other such academic organizations. 

Commercial entities desire competitive intelligence in addition to labour and resources; the nature of competitive intelligence is that it is based primarily on what is known about ones competition; next to unintentional disclosure, the volume of information available online via both traditional search engines and the deep web such as import and landing databases from customs would allow any corporate entity to determine a number of characteristics of their competition that would otherwise remain unknown. Current businesses exist to mine this volume of information and competitive intelligence is an emerging market where various companies offer services of this nature.[vii]

The effects of the deep web on future search engines will be that of content focus in a granular manner and content analysis; as the deep web becomes greater and contains more information of value; search engines will have to develop non-index based databases based on tertiary page characteristics and information such as meta data as referenced via intelligent page capture techniques such as machine learning[viii] and capture. This future although currently dark will be illuminated by the businesses that will gain the most from data mining, business intelligence and information capture.


References



[i] Sokol, Brett; Miami Times, Espionoge is in the Air [Online] World Wide Web, Available from: http://www.miaminewtimes.com/2001-02-08/news/espionage-is-in-the-air/ (Accessed on November 4th 2010)
[ii]  Bergman, K.; White Paper: The Deep Web: Surfacing Hidden Value [Online] World Wide Web, Available From: http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104 (Accessed on November 4th 2010)
[iii] Murray, Brian H.; Moore Alvin; Cyveillence Sizing the Internet A white paper [Online] PDF Document, Available from: http://www.cs.toronto.edu/~leehyun/papers/Sizing_the_Internet.pdf (Accessed on November 4th 2010)
[iv] N.A. ; The Cooperative Association for Internet Data Analysis; Measuring the Use of the IPv4 Space with Heat Maps [Online] World Wide Web; (Accessed on November 4th 2010)  Availble from: http://www.caida.org/research/traffic-analysis/arin-heatmaps/
[v]  Bergman, K.; White Paper: The Deep Web: Surfacing Hidden Value [Online] World Wide Web, Available From: http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104 (Accessed on November 4th 2010)
[vi] N.A. Google Inc. Google Scholar Search Engine [Online] World Wide Web, Available From; http://scholar.google.com (Accessed on November 4th 2010)
[vii] N.A. ImportGenius. About Import Genius [Online] World Wide Web, Available from; http://www.importgenius.com/about.html (Accessed on November 4th 2010)
[viii] Mitchell, Tom. M; CMU; July 2006, The Discipline of Machine Learning [Online] PDF Document, available from: http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf (Accessed on November 4th 2010)