Category Archives: Tech

Resolving HTTP Redirects in Python

Since everyone is using short urls these days and sometimes we just need to know where that URL leads I wrote this handy little function which finds out for us. Redirection can be a kind of tricky thing. We have 301 (“permanent”) and 302 (“temporary”) style status codes and multiple layers of redirection. I think the simplest approach to take is whenever the server returns a Location http header and the value in that location field is not the same as what you made the request to, we can pretty well be sure that it’s a redirect. The function below uses the http HEAD verb/method to request only the headers so as not to waste bandwidth and recursively calls itself until it gets a non-redirecting result. As a safeguard against infinite recursion I have a depth counter.

import urlparse
import httplib

# Recursively follow redirects until there isn't a location header
def resolve_http_redirect(url, depth=0):
    if depth > 10:
        raise Exception("Redirected "+depth+" times, giving up.")
    o = urlparse.urlparse(url,allow_fragments=True)
    conn = httplib.HTTPConnection(o.netloc)
    path = o.path
    if o.query:
        path +='?'+o.query
    conn.request("HEAD", path)
    res = conn.getresponse()
    headers = dict(res.getheaders())
    if headers.has_key('location') and headers['location'] != url:
        return resolve_http_redirect(headers['location'], depth+1)
    else:
        return url

Simple Skew Animation with ActionScript 3

I couldn’t find any good examples on simple tween of a skew effect in actionscipt 3 so I thought I’d share what I came up with.

The problem is that skew is not a property on the MovieClip like x or height or others you’re used to tweening with fl.transitions. To apply a skew effect in AS3 you need to use a matrix transform like this:

    import flash.geom.Matrix;
    var matrix:Matrix = new Matrix();
    matrix.b = _y * Math.PI/180;
    matrix.c = _x * Math.PI/180;
    matrix.concat(target.transform.matrix);
    my_mc.transform.matrix = matrix;

or

    import flash.geom.Matrix;
    var matrix:Matrix = new Matrix(1, _y * Math.PI/180, _x * Math.PI/180, 1, 0, 0);
    my_mc.transform.matrix = matrix;

where _y or _x is the skew angle in radians. Unfortunately when you update a property of the transform matrix like

my_mc.transform.matrix.b = _y * Math.PI/180;

it doesn’t update the movieclip. You need to actually re-assign the matrix to trigger an update. So you can’t simply tween the my_mc.transform.matrix.b directly. Here’s my solution.

    import fl.transitions.Tween;
    import fl.transitions.easing.*;
    import fl.transitions.TweenEvent;
    import flash.geom.Matrix;

    var mymatrix:Matrix = new Matrix(1,0,0,1,0,0);

    function reassignMatrix(e:TweenEvent) {
        bg.transform.matrix = mymatrix;
    }

    var bgTweenSkew = new Tween(mymatrix, 'b', Regular.easeOut, mymatrix.b, Math.tan(_y * (Math.PI/180)), 10);
    bgTweenSkew.addEventListener(TweenEvent.MOTION_CHANGE, reassignMatrix);

Note this will overwrite the scale and rotation properties, which are a part of the transform matrix. See the Adobe livedocs for more information.

Three Alternative Inputs for Video Games

When you use a computer enough it can start to feel like the mouse becomes a part of you. Very quickly you forget about consciously moving it right or left to control the cursor as it becomes second nature. Essentially it is an extension of your physical self, with which you manipulate the screen as naturally as you pick up an object with your hand. Combined with a keyboard it is the only form of input for most ‘serious’ video games in the home because of the level of precision required. I think there are other forms of input that can be used in conjunction with traditional mouse and keyboard for a much richer, more immersive experience.

  1. Eye tracking
  2. Voice control
  3. Video gestures

Eye tracking has been employed in usability and behavioral studies for over a decade, but it has so far been demonstrated only in a very limited sense for actual control input. These two videos demonstrate eye tracking as used for controlling camera movement instead of the mouse and aim instead of what would normally be a physical gun in an arcade.

In the case of a free camera the player moves the camera by looking towards the edge of the screen. When the eyes are centered the camera is still. This is the same kind of control that a joystick uses. Without having tried it myself, I feel like it could be unintuitive because when we look at something we don’t expect it to move away. With the second example you can see that with a fixed camera in place of a physical gun, eye tracking is quite effective, but I don’t see any major advantages over the traditional method.

In first person shooters I feel the best combination will be mouse for camera control and eye tracking for aim. When you look at something on the screen it doesn’t move away – you shoot precisely where you look. Gamers can use the same intuitive interface they’re already used to for moving the character around and firing the weapon. For the quickest, most precise control of aiming, the lighting fast twitch reflex of eye movement is perfect and, in fact, is something players already do.

In 2005, Erika Jonsson of the Royal Institute of Technology in Sweden conducted a study of the methods I described and arrived at similar conclusions:

The result showed that interaction with the eyes is very fast, easy to learn and perceived to be natural and relaxed. According to the usability study, eye control can provide a more fun and committing gaming experience than ordinary mouse control. Eye controlled computer games is a very new area that needs to be further developed and evaluated. The result of this study suggests that eye based interaction may be very successful in computer games.

The problems facing this method are primarily the cost of hardware. Current technologies are used in a limited fashion with academics and usability studies and are generally not available in the mass market. Accurate eye tracking is apparently not an easy thing to do. I’m guessing that, were there a serious market opportunity, some enterprising group of young researchers could simplify the hardware and prepare it for mass production, bringing costs down to reasonable levels for a luxury product.

Voice control can be used for high level commands that might otherwise be accessed from menus or other complex key commands. By using voice it saves the player from having to break from the immersive experience of controlling the character. It has the additional side effect of engaging other parts of the brain and encourages more realistic style of interaction that people encounter in daily life.

The jury is still out on whether people like talking to their computer. I think reception of it will rest, as usual, on the intuitiveness of the commands. The game should never force the player to use voice commands – there always needs to be another path of access – and the player should never have to remember what that command or sequence is. You just say what you want to happen. I think this could be especially interested in non-linear story lines where the options aren’t necessarily clear to the player. Instead of selecting from a menu of pre-defined choices the player could explore (as long as they don’t feel they’re searching in the dark).

Video gestures have been used as a fun, gimmicky activity with the PS3 eye and soon with microsoft’s project natal, but I haven’t seen a use that actually results in interesting game play with what we call “serious” games. One idea is to take hints from the camera and not direct input. When communicating with team mates in multiplayer a user might say a command in voice chat and point in a general direction relative to where he’s looking. The camera could take that hint and cause his character to also point in that direction. This is not something that can be used for precise control, but we can attempt to mimic non-essential body language as added value.

When combined with voice command video gestures could enhance the level of realism and natural interface. For instance the direction of the face could indicate whether the player is giving a command to the game or talking to another person in the room. In story telling dialog, more than one player can input and the video indicates which one of them is talking. Again, I think the best we can do with camera input at this point is imprecise input. Project natal looks like it might do a great job in the casual games space, but we’ll have to see it in the wild controlling games by 3rd party developers.

Visualizing Wikipedia As One Large Node Graph

As a network visualization tool, node graphs are an intuitive method of communicating relationships between entities. I’ve been thinking a lot about the semantic web lately and thought it would be cool to visualize all of the links between articles in Wikipedia at once. I want to pull back and get the 10,000 foot view of the state of modern knowledge, which I don’t think has been done before in a comprehensible way. Chris Harrison’s WikiViz project comes closest but it quickly becomes incomprehensible and is not dynamic.

I have not yet found a tool capable of pulling this off. There are two key ideas that go into representing information at such vast scale. We need to be able to show detailed information in a narrow focus but not get bogged down when zoomed out, which means you need to represent the graph at different resolutions. This has been a problem solved for seeing images at scale. Google earth represents the earth at vastly different resolutions and gigapan is able to zoom into images of many gigapixel size. Second, the kind of information you’re displaying needs to make sense at any height. That means when you’re looking at the graph from 10,000 feet it shouldn’t devolve into a gray blur. Google maps also demonstrates this by removing detail such as building names and street names, cities, and states when you zoom out. Because I’m a gamer I’m inspired by Supreme Commander which developed an innovative way of showing tactical information. You can zoom out to see the playing field as a whole and seamlessly zoom in to examine an area in detail. When zoomed out, individual units become symbols that still convey what the unit is.

At a detailed level a single node could contain basic information including the name, some classification, and perhaps a summery. We can use a sunburst style visualization to represent links. As you zoom out that detail gradually disappears. At a high level less significant articles can be represented by generalized concepts. Higher yet, even more general concepts begin to swallow up the more specific ones. The higher you get the more general the concept. Less significant links between concepts fade into the background. The big challenge is reliably building a concept tree for any node in Wikipedia. A lot of research and effort has gone into that area, but it’s not quite there yet. People would be forgiving of the accuracy to start with.

So here’s a summary of the requirements for a tool to visualize Wikipedia

  • Must handle 3.2 million nodes and tens of millions of edges (links)
  • Must be able to modify the graph dynamically to highlight changes in real-time. This means we need something other than the standard spring plotting algorithm which runs in computational complexity O(n^2).
  • Must be able to represent clusters of nodes symbolically as concepts and gradually move between level of detail
  • Must be able to operate with partial graph data. The client application will see only a small slice of the network at once or a high level view of a larger area at a low resolution.

In my brief analysis there are very few tools designed to handle large data sets dynamically. AI3 has a list of 26 candidate tools for large scale graph visualization and although some are visually stunning and some are large scale, none satisfy the requirements above. It seems like the major innovation needed here is a map-reduce style algorithm for undirected graphs. Map-reduce works well with a tree structure, but not as well with unstructured, cyclic graphs. In Wikipedia any node can be linked to any other node and there’s no consistent parent-child relationship. Everything is an “unowned” entity. If a comprehensive and reliable concept hierarchy could be generated from Wikipedia links and text we might be able to use that as the tree-like structure where each level of the tree roughly represents one resolution of the network.

Anyway – that’s something to think on.

UPDATE: A new open-source project called Gephi looks really interesting. http://gephi.org

Here are some more links of interest:
http://cytoscape.org/screenshots.php
http://arxiv.org/abs/cs/0512085
http://blog.semanticfoundry.com/2009/06/01/dynamic-visualization-introduction-theory/

Upgrading to MySQL 5.1 and PHP 5.3.0 on Centos with Plesk

This is an account of my experience trying to upgrade MySQL to version 5.1 on my MediaTemple (dv) 3.5 dedicated virtual server. It may help people not on MediaTemple servers, especially if you’re trying to upgrade MySQL without damaging Plesk. There are a lot of forum posts out there that lead you down the wrong path so beware.

Before you go any further, this could really mess up your server and its not totally tested so if you have anything you depend on, I would be very careful. In this process I actually had to do a complete restore because I screwed the server up so royally before I figured everything out.

First, obviously, you need to install the developer tools through the (mt) control panel which enabled root access and SSH. Then you need to install yum, following the directions in the (mt) knowledgebase article. There’s also a note at the bottom of that article about removing SiteBuilder, which you should do, but don’t bother installing the atomix repository.

Then, you’ll see that mysql 5.0.x is the latest in the existing repositories. Remi’s repos have all the latest goodies so we need to enable that. This page is helpful: http://blog.famillecollet.com/pages/Config-en

wget http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm
wget http://rpms.famillecollet.com/enterprise/remi-release-5.rpm
rpm -Uvh remi-release-5*.rpm epel-release-5*.rpm

A quirk about centos is that after adding a repo it’s not actually enabled by default. There are two ways to go about getting access to the files. Either add the option –enablerepo=remi to the yum command or edit the /etc/yum.repo.d/remi.repo file and change enabled=1. Now if you do yum –enablerepo=remi info mysql it SHOULD say the latest 5.1.x version. But there are a couple other tricky bits.

If you try upgrading mysql you’ll notice that there are two packages creating conflicts. php-mhash and php-ncurses. Those two packages are safe to remove (yum remove php-mhash php-ncurses). Now you should be able upgrade mysql, which will also upgrade php to 5.3.0 (but if you try to upgrade php alone, you’ll get errors). yum –enablerepo=remi upgrade mysql

You’re almost there. You’ll notice that mysql won’t start, which is because there are old settings in /etc/my.cnf that trip it up. I’m doing this upgrade on a completely fresh system and I don’t have to worry about loosing any settings or even any databases that already exist, so I moved /etc/my.cnf to a new location and restarted mysql, which regenerates a fresh my.cnf. The final step is to run “mysql_upgrade -u admin -p”. The password is your plesk admin password.

The one unsolved part of this mystery is updating the plesk tables. All of these tables fail the update script. Still, I’ve clicked around and Plesk seems to be working OK.

Summary of Free Graphical Data Modeling Tools

The Goal: To graphically design a relational data model and generate both DDL to create the DB structure and the PHP5 class structure including public/private member variables and getters, setters, update, create, delete with working DB link. Icing on the cake would be if the data model could be updated graphically after generating the code and generate a patch file for updating the PHP5 code and an SQL script with the appropriate alter/add/drop commands to update the existing schema.

Eclipse Frameworks

Eclipse has a complex collection of various layers of frameworks, some seem a little redundant or aren’t immediately clear what functionality they provide. Most of these frameworks are in “incubation” phase. There are also proprietary plugins that utilize these frameworks but are not distributed using the official eclipse repositories. The frameworks serve simply as platforms to aid development of functional plugins primarily by third parties. The Eclipse project itself doesn’t provide any implemented solutions except for UML2Tools which is a component of the MDT.

The eclipse modeling project consists of the following:

  • Eclipse Modeling Framework Project (EMF) – (link) The EMF project is a modeling framework and code generation facility for building tools and other applications based on a structured data model. From a model specification described in XMI, EMF provides tools and runtime support to produce a set of Java classes for the model, along with a set of adapter classes that enable viewing and command-based editing of the model, and a basic editor.
  • Model Development Tools (MDT) – (link) The Model Development Tools (MDT) project focuses on big “M” modeling within the Modeling project. Its purpose is twofold: To provide an implementation of industry standard metamodels. To provide exemplary tools for developing models based on those metamodels.
    • UML2Tools (link) is a set of GMF-based editors for viewing and editing UML models. It is an optional component of the MDT project. It does not provide entity relationship modeling so I’m not going to say much more about it.
  • The Eclipse Graphical Modeling Framework (GMF) (link) provides a generative component and runtime infrastructure for developing graphical editors based on EMF and GEF. The project aims to provide these components, in addition to exemplary tools for select domain models which illustrate its capabilities.
  • The Graphical Editing Framework (GEF) (link) allows developers to create a rich graphical editor from an existing application model. GEF consists of 2 plug-ins. The org.eclipse.draw2d plug-in provides a layout and rendering toolkit for displaying graphics. The developer can then take advantage of the many common operations provided in GEF and/or extend them for the specific domain. GEF employs an MVC (model-view-controller) architecture which enables simple changes to be applied to the model from the view.GEF is completely application neutral and provides the groundwork to build almost any application, including but not limited to: activity diagrams, GUI builders, class diagram editors, state machines, and even WYSIWYG text editors. The Logic Example, one of the provided examples, is pictured below

Free Standalone and Plugin Implementations

Clay Database Modeling

Clay (link) is an Eclipse plugin by Azzurri, a Japanese company. The free version provides nearly all features needed to build a proper ERD. The Pro version give support for enterprise databases, printing and exporting images, and document generation. Clay exports clean DDL code for mysql. It can also reverse engineer databases.

Conclusion: This is a decent choice. It won’t lock you into using it. It doesn’t generate an interface for the application layer so it doesn’t fulfill the goal of this study.

RISE

RISE (link) is a freemium, proprietary, Windows only solution, but it does some good stuff. RISE is the only free solution I’ve found that will generate both the data layer and the application layer based on an entity relationship diagram. It has one of the easier to use interfaces for creating entities, attributes, relations, and views. The stereotypes concept makes building common structures such as trees, lists, classifications, and extensions easier. Connecting a customizable Interface to an entity or view allows C# or PHP code to be generated, giving your application layer access to the data layer. The application code fully implements methods that perform standard operations on your data layer. RISE will even generate a SOAP web service interface and provides an AJAX framework.

Despite all that praise, there are a few problems with RISE (not including it being windows only, etc). There doesn’t seem to be a way to create indexes, primary keys, auto-incremented values, or adjust the precision on data types.

What RISE generates is not proper DDL or a database that actually reflects your ERD. It produces a stored procedure that creates four tables for keeping track of the log and model versions in addition to the actual entity tables. This is presumably so it can gracefully upgrade the structure to a new version without loosing data but it results in an unclean database.

Conclusion: RISE may be a good choice for some applications but the fact that it doesn’t follow standards makes me disqualify it for lack of extensibility if you should ever decide to stop using   it.

ArgoUML

ArgoUML (link) is an open source, standalone UML and code generating application written in Java and maintained by Tigris.org, the same group that does subversion. It doesn’t support modeling database schemas out of the box, but it does have an officially supported plugin that does so called argouml-db. As of this writing the latest argouml-db plugin isn’t supported in the latest build of ArgoUML so you’ll need to get the bundled version from the sourceforge site https://sourceforge.net/projects/dbuml.

My impressions is that the db plugin is a hacked together job and isn’t very well maintained. It’s tricky to even get running, clumsy to add attributes to a table. The properties also doesn’t have basic options that you would expect as part of a DDL such as length of columns or auto_increment functions or null / not null. Generating the code from the bundled version gives the option of generic “SQL” and java. The sql didn’t work out of the box and the generated file only contained an error. The java code didn’t include any getters, setters, or link to the database. Instead of primitive data types the generated java code imported types.sql.VARCHAR and similar.

Conclusion: not worth anyone’s time for ERDs

Umbrello

Umbrello (link) is an open source application that is part of the KDE project, which is primarily targeted at linux distributions. Getting the latest Umbrello working on mac and windows can be a bit more of a pain. Macports and Fink as package managers for Mac and both have up-to-date versions of most mainstream linux software. You can compile from source most of KDE including Umbrello. It might be something you want to do overnight. Expect to encounter errors which you’ll have to find your way around. Needless to say, this is not a user-friend installation in non-linux environments.

Umbrello does a decent job of modeling relational databases (entity relationship model) and includes all expected features including data types and properties, auto-increment, foreign key constraints, etc. It does a fairly solid job of generating mysql or postgresql DDL, but it won’t export the data model to php. You’ll have to create a separate class diagram for that.

Conclusion: if you run linux this is a decent option considering the alternatives.

Non-free solutions

Conclusions

This is a relatively uncrowded market for robust solutions. The only two tools found that satisfy the core requirements of my goal are Visual Paradigm’s Database Visual Architect, the enterprise level proprietary solution and in a somewhat broken way, RISE, the free solution. No other tools produce both data and application layer code. I discount RISE for not keeping with standard methods and not producing extensible, clean code. I discount DB Visual Architect for it’s prohibitive cost in non-enterprise environments.

My chosen solution is the Clay Database Modeling plugin for eclipse, which does a good job of modeling and exporting DDL for the data layer. The application layer will need to be designed and implemented separately.

Comments are welcome on these solutions or any others not included.

Conficker C and a future with self-evolving computer viruses

I’m absolutely enthralled by the Conficker C virus after reading this analysis from SRI International. The C variant is the third major generation of the Conficker virus and demonstrates the highest level of sophistication found in any computer virus or worm to date. What excites me most about it is the decentralized nature of its peer-to-peer method of quickly propagating updates to itself in the roughly 12 million infected computers around the globe.

Researchers have identified the date of April 1st as when the virus wakes from hibernation. The event has been simulated with a copy of the virus in computer science laboratories, but since the virus gets its instructions through the peer-to-peer network and rendezvous points, it is impossible to tell what kind of code will be executed until it happens. Speculation has ranged from the biggest April Fools Day joke ever to a massive dragnet or “Dark Google” allowing the virus’s authors to search for sensitive information on infected machines en-mass and sell it to criminal organizations or governments. Which is especially worrisome since Conficker has infected many government and military networks.

The authors have demonstrated the most cutting edge knowledge in multiple disciplines so this is not just a kid sitting in his mothers basement. This is a closely coordinated effort between a group of extremely talented individuals and I personally wouldn’t be surprised if the authors, if caught, turn out to be a part of a government initiative. The SRI report says “those responsible for this outbreak have demonstrated Internet-wide programming skills, advanced cryptographic skills, custom dual-layer code packing and code obfuscation skills, and in-depth knowledge of Windows internals and security products.”

Conficker C does a mutex check with pseudo-randomly generated names when it initially installs to avoid overwriting itself. Then it patches the win32 net API to inhibit antivirus software and block antivirus websites. The fact that it patches only in memory DLL files and not persistently stored DLLs means that removal tools can’t simply replace the compromised files with clean ones. It also does a patch of the same windows vulnerability it initially used to enter the system but leaves a back door so that only new variants of Conficker can use it. This prevents other viruses from piggybacking on Conficker and competing for control of the system.

Conficker spawns a thread with the purpose of searching for a static list of known anti-virus applications and terminating them to defend itself from attack and blocks services that allow anti-virus software to auto-update. It also deletes all windows restore points and removed safe mode as a boot option. Conficker uses dual-layer encryption and code obfuscation to hinder efforts at reverse engineering it. Conficker released an update just a few weeks after a new md6 hashing method became publicly available from the original researchers at MIT.

The authors uses two similar methods of propagating updates to infected machines. Previous Conficker variants use a clever “rendezvous” system to randomly generate a huge list of possible locations for rendezvous locations where the authors may have placed a distribution server that changes on a daily basis. The randomized nature and the large number of possible locations make efforts to block those domains impractical. Once a machine has an update it can also assist in spreading it to other machines via the peer-to-peer network. Currently, almost all p2p networks require some kind of “seed” or predefined peer list to be introduced into the network, but Conficker doesn’t. It uses the same kind of pseudo-randomly generated destination list as the rendezvous system to generate an initial peer list, which essentially bootstraps itself into the network. There is absolutely no bottleneck that can be attacked to stop Conficker from communicating with its peers.

Some of my own ideas

Antivirus filters and coordinated strategies that to thwart the spread of viral software utilize patterns to identify uninvited guests. I see future decentralized malware using a randomized approach to avoid detection. If the same application is also capable of virally updating its peers, a system of natural selection will evolve. This is essentially how genetic algorithms work. In this case, the natural fitness function is the ability to infect new systems (spawning offspring) and its defensive ability to ward of efforts of removing it from the host (self preservation). In this sense, randomized configurations (the genetic code) of the virus will be propagated to new systems and over existing instances at a higher rate, and thus the more successful variants would become the most prevalent. And there we have a model of evolution.

The process works the same way as HIV and flu viruses and has the effect of a self-healing, growing network that can autonomously adapt to new countermeasures developed by antivirus companies. A self-evolving computer virus has the advantage over biological evolution of electronic speed. Time is the greatest enemy of evolution. Digital organisms have the chance to excel beyond anything we have observed in nature.

Fortunately, genetic programming hasn’t yet advanced to the stage where the scenario I proposed is practical. Current genetic programming algorithms are limited to a constant set of numerically variable attributes that are randomly modified in each generation. Those attributes could never be complex enough to resemble an actual evolving organism and too much of the logic for a computer virus has to be pre-programmed and static. I expect that to change over the next decade as more work is done in this area so watch out.

Links: