Wednesday 30 January 2013

Data-mining workshop No.1


Data-mining is the computational analysis of information to uncover patterns that exist within large datasets. These patterns are visualised by various tools and techniques such as producing charts in line and pie form, gauges and maps and is a fantastic way to uncover abnormalities and trends that exist within datasets.

However, before we can start the visualisation process we must first analyse the data and for that we need to capture some data by gaining access or uncovering potential new sources of data that are relevant for our needs.

Data-scraping

One technique is to convert an existing source of digital data into another form that can be then processed further and this is called data-scraping. These sources of data may be found in public domain information websites, publicly released Portable Document Format (PDF) and various other digital formats that are beyond the scope of this blog.

It is important to stress that obtaining copyrighted or protected material without consent will result in some type of legal action against you.  Always seek out permissions and research thoroughly the terms and conditions of the use of the data by the author (whether they are individuals or governmental departments).
For further discussion about the morals surrounding data-scraping a good article to read is:


Obtaining Open Datasets

Publicly available data and open datasets are available and will provide the legal source of data that we will use to show the technique over the course. Starting with Microsoft’s Excel spreadsheet program and then in a later post we will be using the web based Google spreadsheet interface.

As mentioned previously the two most common forms of digitally published data can be found inside web pages and PDF form.

In this first tutorial we will be looking how we can use the Microsoft Excel program to access data found in public domain web pages. Focusing only on getting data as the further techniques of data cleaning and data visualisation are necessary however are complex and need further discussion at a later date.

Working on the HTTP layer

Web pages can be built using a variety of competing computer languages which all rely on using the Hypertext Transfer Protocal (HTTP). HTTP or “http://” should be familiar to most people as it is represented in the URL address section of most modern browsers.

HTTP is the base that allows applications to communicate on the Internet and this interaction is what forms the World Wide Web.

As we view the World Wide Web through a browser we are viewing an interpretation of that particular web browser's current understanding of Hypertext Markup Language (HTML) and supported technologies (CSS, Flash media, etc).

During interpretation the web browser organises and displays the various elements of the Hypertext Markup Language that consist of tags found inside "< >" symbols.

Although there is a multitude of tags to select and possibly an infinite way to arrange these tags to create a web page. In our context of data-scraping, the two that we will focus on are:

<TABLE>
<DIV>

The Table and Div tags allow the web page to be created as a structure (or grid) and allows digital publishers and editors to place the content elements (text, images, links, etc.,) inside to enhance visual arrangement. This is much the same as using a textbox inside a word processing program such as Microsoft Word or Open Office Writer program to adjust the flow of text giving added meaning or emphasis for the reader. 

Tuesday 29 January 2013

#DJcamp Twitter googlesheet fix

#DJcamp2013 update

To ensure that your twitter google spreadsheet is working properly please follow the steps mentioned below:

Once inside the google spreadsheet please navigate using the menus:

1. Tools > Script Editor 


A new sheet will appear and inside this new sheet you navigate the menu and activate:

2. Resources > Current script's triggers


Once inside the Script Editor window please select:
3. Triggers > Current script’s triggers
4. Add a new trigger. 
5. Select ‘collectTweets’ from the first dropdown menu and change from 'spreadsheet' to ‘Time-driven’ 
 6. Save

As we all got authenticated in the group or you have access to the keys that I distributed all that remains is to change the #hashtag you want to use and 'Run' the google spreadsheet again.

I will be blogging the workshop about sourcing and 'scrapping' open datasets on here tomorrow but if anyone want to have a Google Hangout then I can arrange it.

Thanks Guys!

Sunday 14 October 2012

Orbital Jump - Red Bull Stratos

The influence of the 1960’s will long be remembered in history and it is easy to forget how important this period of time was from our modern perspective. A time of radical ideals and social awakenings, a period of time facilitated by radio technology which spawned a new political and cultural undertone through the use of music, which lead to some of the first mass anti-war protests in modern society.

Aviation was one of these ideas that got completely revolutionised during the 1960’s and publically with President Kennedy and his momentous ‘We choose to go to the moon’ speech at the Rice University in 1962. A speech that had thousands of engineers pushing the boundaries of what was possible of both man and machine, with one project, taking man to his absolute limit, to the place known as ‘Black sky’.

This project was called Excelsior, and although the project was initiated in 1958 the final conclusion occurred on August 16th 1960, where Colonel Joe Kittinger became the first human to successfully jump from an altitude of 102,800 feet (31,300 m), a record that helped engineers develop the next generation of space suits and parachute systems that the project tested a parachute system that would control descent of a human safely after a high-altitude ejection, as a problem of spinning uncontrollably was noticed using dummy bodies and traditional parachutes. The new system comprised of a small stabilizer parachute and a main parachute that would be deployed automatically at the desired point along the descent trajectory.

Now after 50 something years, this project is being challenged!

A challenge not backed by the military funding or government agencies as done previously for so many important discoveries before, but a project that supported by industry and is prepared to see if man can once again push the envelope of ‘what-is’ possible.

The Red Bull Stratos project is pushing the boundary of free fall flight and will once again; send man back to the edge of space. This man is Felix Baumgartner, and supported by a team of experts (including Colonel Joe Kittinger) is planning to step out of a stratospheric balloon at 120,000 feet and make the first supersonic freefall flight; a first for mankind to pass the sound barrier unaided. This attempt to dare atmospheric limits holds the potential to provide valuable medical and scientific research data for future pioneers.

Preliminary data from the International Air Sports Federation (FAI) shows data from a successful test jump by Felix Baumgartner on the 15th of March 2012 and how the statistics measured up:

  • Altitude reached: 71,581 feet
  • Parachute opened at: 7,890 feet
  • Freefall time: 3 minutes and 33 seconds
  • The fastest ascent rate of the capsule: 1,200 feet per minute (estimate)
  • Speed reached in freefall: 364.4 miles per hour
  • The risks are momentous and as proven by an unsuccessful project in 1966 by Nick Piantanida where sadly he lost his life in the attempt to reach further into the ‘Black Sky’. Although I am sure Felix is more than prepared of all possible scenarios!

    If you would like to follow his attempt you can over the following channels:


    Web: www.redbullstratos.com/
    Facebook: www.facebook.com/redbullstratos
    Twitter: http://twitter.com/RedBullStratos

    Further reading:

    Book links:

    The 1960s Cultural Revolution (Greenwood Press Guides to Historic Events of the Twentieth Century) by John C. McWilliams

    Magnificent Failure: Free Fall from the Edge of Space [Hardcover] by Craig Ryan

    Come Up and Get Me: An Autobiography of Colonel Joe Kittinger [Paperback] by Joe Kittinger

    Thursday 2 August 2012

    Today is the day I learnt to Babel


    "BABEL FISH :

    The Babel fish is small, yellow and leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier but from those around it, It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. the practical upshot of this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any language.
    Now it is such a bizarrely improbable coincidence that anything so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see as a final and clinching proof of the non-existence of God. The argument goes like this : "I refuse to prove that I exist", says God, "for proof denies faith, and without faith I am nothing."
    "But", says Man, "the Babel fish is a dead giveaway isn't it? it could not have evolved by chance. it proves you exist, and so therefore, by your own arguments, you don't. QED."
    "Oh dear", says God, "I hadn't thought of that," and promptly vanishes in a puff of logic.
    "Oh that was easy" says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing.
    Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."

    - excerpt from the The Hitchhiker's Guide To The Galaxy by Douglas Adams

    Well while the Babel fish continues to operate in another part of the universe (and quite possibly another moment of time), we mere mortals have to continue to face the daily challenges that modern life throws at us to make sure we stay interested.

    For me today bought a new challenge of finding a service that would allow my department to publish an online survey in Arabic. Since we are a predominately English speaking department this was a great challenge and one that threw up some interesting bits of discussions.

    The problem:

    Generally we use Survey Monkey (Surveymonkey.com) to produce and manage our online surveys, as the service is a great web app and simple to use, however although Survey Monkey proclaims that:

    "We Support Any Language"


    The service later reports:

    "We do not provide support in creating language formats that go from right to left, like Arabic or Hebrew."


    Great! not sure what that means but I will give it a go!

    Internationalization

    As the world becomes more connected and communication becomes a continual stream of information, this information needs to be organized and prepared in order to be effective and in technology a technique that is used is something called 'internationalization'.

    Survey Monkey supports this technique by providing a series of prepared messages and lists the following elements that have been 'internationalized':
    • Navigation Buttons
    • Add a Comment Field option
    • Demographic Information question: Default labels are translated
    • Text Validation error message
    • Require Answer to Question error message
    • Date and Time validation error message
    • Collector Password Restriction feature: Prompts and default messages are translated
    • Thank You Page collector feature: Default message is translated
    • Popup Invitation Collector: Default buttons and message are translated
    Windows OS activated language control
    So allowing the users operating system to take care of the actual creation of text to be inserted into the survey. This is the icon on the bottom right hand side of a Windows based operating system (OS) when activated looks something similar to this:

    Now the issue remaining is the flow of text as Arabic reads from right-to-left and for this problem I have requested assistance from a professional Arabic translator to proof the survey as I create it during a online session tomorrow.

    *this post will continue in another session hopefully tomorrow if time permits


    Further reading:
    Unicode Demystified: A Practical Programmers Guide to the Encoding Standard
    Multilingual Information Retrieval: From Research To Practice
    The Complete Hitchhiker's Guide to the Galaxy: The Trilogy of Five


    BLOG FLASH!! Breaking news

    Creating online survey's in other languages

    A problem came up at work this morning about how to create a survey in the Arabic language!! The problem was solved within 20 mins after we had placed a 'shoutout' on twitter, which Mr Damien Radcliff kindly RT on his network.

    The issue is actually solved by the base operating system that the user is using or operating to gain access to the internet, so i have decided to create a blog post tonight just going through the issue and practical help on how to solve this problem technical as writing in Arabric is a whole other thing.

    Sunday 29 July 2012

    Digital Subtitle Fail


    The virtual mediasphere is being analysed and (re)defined at a blistering pace and yet in the drive to hook in social media, standards are being to slip in traditional media organisations.

    One of these ‘slips’ in the inability to provide quality subtitles for the London 2012 Olympic opening ceremony, something which didn’t go unmissed on the Twitter channel thanks to the campaigning work by Pesky People (@PeskyPeople #subtitlesnow). 

    So what’s the problem with the new digital broadcasting frequency that has been introduced in the United Kingdom?

    The old analogue broadcasting system allowed for a subtitling service to be combined with the signal and delivered flawlessly to the audience, and this system was called Ceefax and was activated by using the remote control command of 888.

    Now the Ceefax system had three main characteristics:
    1. Not pleasant to look at
    2. Strangely familiar
    3. The same age as me
    Now I don’t want to get into trouble about the politics of why providing services under the new digital network have ended or been replaced although that would be nice but more to return to the point of starting this blog post to begin with, which is...... is it possible to create a subtitling service based on current and emerging developments.

    Voice to text is one of those technological achievements that hasn’t really lived up to expectations, although a successful application in the ‘real’ world would be a game changer and maybe a DIY developer team in the United Kingdom might have shown the rest of us what is possible.

    Using a Pi raspberry using the Debian operating system, a few tablet devices, microphones and visual glasses to relay text to the user, the team showed that it is possible to not only convert voice to text but to translate it in real time with only a small amount of delay to allow the system to cope with the requests.


    **Next blog post will be a continuation of this theoretical discussion examining in more detail this and other applications that will allow us to providing a online subtitling service.

    Further reading:

    Additional Online Reading:


    Saturday 28 July 2012

    Accessibility Now!


    The Olympic games opened in London last night the celebrations began with an almighty bang and the United Kingdom came alight with excitement.

    www.london2012.com
    The BBC being responsible for the broadcast and with being an organisation that is supported by the people of Great Britain through a special licence fee agreement enforceable by law, you would have thought that this great innovative organisation would have made the broadcast accessible.

    Well sadly not according to Pesky people!

    Pesky people are involved around the space in digital media where “Disability meets Digital - campaigning to improve digital access for Disabled and Deaf people”
    ~ Peskey People’s mission statement.

    Accessibility on the web is an absolute discredit as the programming talent is available to provide integrated accessible online services and yet at this stage of online technological development, creators and producers of online content still do not plan for the additional time needed to ensure accessibility is met for their audience.

    The larger technological companies are trying to support new techniques to convert speech to text, like YouTube’s automatic captioning facility, but in a recent blog by Pesky People on their website this type of approach isn't working.


    www.peskypeople.co.uk



    Personally I think that we are engrained as a task based society in the western world and that we forget that we also need to be responsible for the content that we produce, and that responsibility should be planned into the work pattern at the point of conception not simply injected at the end of a project. 

    **Tomorrow blog post will be a theoretical discussion on a possible innovation to provide high quality subtitling to mainstream broadcasting services using current and emerging technology.

    Further reading: