April 20, 2011 (Computerworld)
Got data? These useful tools can turn it into informative, engaging graphics.
You may not think you’ve got much in common with an investigative journalist or an academic medical researcher. But if you’re trying to extract useful information from an ever-increasing inflow of data, you’ll likely find visualization useful — whether it’s to show patterns or trends with graphics instead of mountains of text, or to try to explain complex issues to a nontechnical audience.
Want to see all the tools at once?
For quick reference, check out our chart listing 22 free data visualization tools.
There are many tools around to help turn data into graphics, but they can carry hefty price tags. The cost can make sense for professionals whose primary job is to find meaning in mountains of information, but you might not be able to justify such an expense if you or your users only need a graphics application from time to time, or if your budget for new tools is somewhat limited. If one of the higher-priced options is out of your reach, there are a surprising number of highly robust tools for data visualization and analysis that are available at no charge.
Related Blog
8 more free tools for data visualization and analysis
Here’s a rundown of some of the better-known options, many of which were demonstrated at the Computer-Assisted Reporting (CAR) conference last month. Others are not as well known but show great promise. They range from easy enough for a beginner (i.e., anyone who can do rudimentary spreadsheet data entry) to expert (requiring hands-on coding). But they all share one important characteristic: They’re free. Your only investment: time.
Data cleaning
Before you can analyze and visualize data, it often needs to be «cleaned.» What does that mean? Perhaps some entries list «New York City» while others say «New York, NY» and you need to standardize them before you can see patterns. There might be some records with misspellings or numerical data-entry errors. The following two tools are designed to help get your data in tip-top shape to be analyzed.
DataWrangler
What it does: This Web-based service from Stanford University’s Visualization Group is designed for cleaning and rearranging data so it’s in a form that other tools such as a spreadsheet app can use.
Click on a row or column, and DataWrangler will suggest changes. For example, if you click on a blank row, several suggestions pop up such as «delete row» or «delete empty rows.»
There’s also a history list that allows for easy undo — a feature that’s also available in Google Refine (reviewed next).
What’s cool: Text editing is especially easy. For example, when I selected «Alabama» in one row of sample data headlined «Reported crime in Alabama» and then selected «Alaska» in the next group of data, it led to a suggestion to extract every state name. Hover your mouse over a suggestion, and you can see affected rows highlighted in red.
Click to view larger image.
Drawbacks: I found that unexpected changes occurred as I attempted to explore DataWrangler’s options; I constantly had to click «clear» to reset. And not all suggestions are useful («promote row to header» seemed an odd suggestion when the row was blank) or easy to understand («fold split 1 using 2 as key»).
And while the fact that DataWrangler is a Web-based service makes it convenient to use, don’t forget that it sends your data off to an external site — which means it isn’t an option for sensitive internal information. However, there are plans for a future release of a stand-alone desktop version. Another important thing to keep in mind is that DataWrangler is currently alpha code, and its creators say it’s «still a work in progress.»
Skill level: Advanced beginner.
Runs on: Any Web browser.
Learn more: There’s a screencast on the Data Wrangler home page. Also, see this post on using DataWrangler to format data (from Tableau Public’s blog).
Google Refine
What it does: Google Refine can be described as a spreadsheet on steroids for taking a first look at both text and numerical data. Like Excel, it can import and export data in a number of formats including tab- and comma-separate text files and Excel, XML and JSON files.
Refine features several built-in algorithms that find text items that are spelled differently but actually should be grouped together. After importing your data, you simply select edit cells –> cluster and edit and select which algorithm you want to use. After Refine runs, you decide whether to accept or reject each suggestion. For example, you could say yes to combiningMicrosoft and Microsoft Corp., but no to combining Coach Inc. with CQG Inc. If it’s offering too few or too many suggestions, you can change the strength of the suggestion function.
There are also numerical options that offer quick and easy overviews of data distributions. This functionality can reveal anomalies that might be the result of data input errors — such as $800,000 instead of $80,000 for a salary entry, or it could expose inconsistencies — such as differences in the way compensation data is reported from entry to entry, with some showing, say, hourly wages and others showing weekly pay or yearly salaries.
Beyond data housekeeping, Google Refine offers some useful analysis tools, such as sorting and filtering.
What’s cool: Once you get used to which commands do what, this is a powerful tool for data manipulation and analysis that strikes a good balance between functionality and ease of use. The undo/redo list of every action you’ve taken lets you roll back when needed. And text functions handle Java-syntax regular expressions, allowing you to look for patterns (such as, say, three numbers followed by two digits) as well as specific text strings and numbers.
Finally, while this is a browser-based application, it works with files on your desktop, so your data remains local.
Drawbacks: Although Google Refine looks like a spreadsheet, you can’t do typical spreadsheet calculations with it; for that, you must export to a conventional spreadsheet application. If you’ve got a large data set, carve out some time in your day to go through all of Refine’s suggested changes, since it can take a while. And, depending on the data set, be prepared when looking for text items to merge: You’re likely to get either a lot of false positives or missed problems — or both.
Skill level: Advanced beginner. Knowledge of data analysis concepts is more important than technical prowess; power Excel users who understand data-cleaning needs should be comfortable with this.
Runs on: Windows, Mac OS X (if it appears to do nothing after loading on a Mac, point a browser manually to http://127.0.0.1:3333/ ), Linux.
Learn more: These three screencasts give a good overview of why and how you’d use Refine; there’s also fairly detailed documentation on the Google Code project area.
Statistical analysis
Sometimes you need to combine graphical representation of your data with heftier numerical analysis.
The R Project for Statistical Computing
What it does: R is a general statistical analysis platform (the authors call it an «environment») that runs on the command line. Need to find means, medians, standard deviations, correlations? R can handle that and much more, including «linear and generalized linear models, nonlinear regression models, time series analysis, classical parametric and nonparametric tests, clustering and smoothing,» according to the project website.
Click to view larger image.
R also graphs, charts and plots results. There are numerous add-ons to this open-source project that significantly extend functionality. For users who prefer a GUI, Peter Aldhous, San Francisco bureau chief for New Scientist magazine, suggests RExcel, which offers access to the R engine through Excel.
What’s cool: There is a great deal of functionality in R, including quite a number of visualization options as well as numerical and spatial analysis.
Drawbacks: The fact that R runs on the command line means that users will have to take the time to learn which commands do what, and not all users will be comfortable with a text-only interface. In addition, Aldhous says those dealing with large data sets may hit a memory barrier (if so, there’s a commercial option from Revolution Analytics).
Skill level: Intermediate to advanced. Comfort with command-line prompts and a knowledge of statistics are a musts for the core application.
Runs on: Linux, Mac OS X, Unix, Windows XP or later.
Learn more: Try R for Statistics: First Steps (PDF) by Peter Aldhous, Hands-on R, a step-by-step tutorial (PDF) by Jacob Fenton, and the project’s own An Introduction to R. The R Statistics blog has a number of visualization samples.
Visualization applications and services
These tools offer a number of different visualization options. While some stick to conventional charts and graphs, many offer a range of other choices such as treemaps and word clouds. A few offer geographical mapping as well, although if you’re interested in maps, our sections on GIS/mapping focus specifically on that.
Google Fusion Tables
What it does: This is one of the simplest ways I’ve seen to turn data into a chart or map. You can upload a file in several different formats and then choose how to display it: table, map, heatmap, line chart, bar graph, pie chart, scatter plot, timeline, storyline or motion (animation over time). It’s somewhat customizable, allowing you to change map icons and style info windows.
Click to view interactive map.
There are some data editing functions within Fusion Tables, although changing more than a few individual cell entries can quickly become tedious. You can also join tables (which is important when the data you want to map is in multiple tables), and filter, sort and add columns and so on. There are also options to allow others to make comments on the data itself.
Mapping goes beyond just placing points, as many of us are accustomed to with Google Maps. Fusion tables can also map multiple polygons with variations in color based on underlying data, such as this intensity map showing the percentage of households with Internet access by state from 2007 U.S. Census bureau data.
The Knight Digital Media Center notes that a handy undocumented feature allows the use of Fusion Table’s «templating» export to generate a JSON file from data in other formats. JSON is required by some APIs and JavaScript libraries.
Unlike IBM’s Many Eyes, Google lets you designate your data as private or unlisted as well as public, although your data still resides on Google’s servers — a benefit or drawback, depending on whether server bandwidth costs or data privacy is more important to you.
What’s cool: Fusion Tables offers relatively quick charting and mapping, including geographic information system (GIS) functions to analyze data by geography. The service also automatically geocodes addresses, which is useful when trying to place numerous points on a map. This is an excellent tool for beginners and advanced beginners to use to get comfortable with analyzing data; it’s also a good fit for people who don’t program. For more advanced users, there’s an API.
Drawbacks: Functionality, customization and data capacity are all limited compared with desktop applications or custom code, and interacting with large data sets on the site can be sluggish. And it has its limitations — the site choked on March 11, the day of the devastating earthquake and tsunami in Japan. (It is still a Google Labs beta project.)
Skill level: Beginner.
Runs on: Any Web browser.
Learn more: A Google Fusion Tables tour and several tutorials are available. We’ve also got some examples of what it can do in our story «H-1B Visa Data: Visual and Interactive Tools.» Also see the Fusion Tables Example Gallery.
Impure
What it does: Impure is sort of a Yahoo Pipes for data visualization, designed for creating numerous types of highly polished graphical representations of data using a drag-and-drop workspace. The service includes a library of objects and various methods, and — as with Yahoo Pipes — it allows you to click and drag to connect modules so that the output of one becomes the input of another. It was developed by Spanish analytics firm Bestiario.
What’s cool: Impure offers a highly visual interface for the task of creating visualizations — which is not as common as you might expect. It has a sleek user interface and numerous modules, including quite a few APIs that are designed to pull data from the Web. It features numerous visualization types that are searchable by keywords like numeric, tables, nodes, geometry andmap. And although it saves your workspaces to the Web, you can copy and save the code behind your workspaces locally, so you can back up your work or maintain your own libraries of code snippets.
Drawbacks: Users of Impure face a surprisingly steep learning curve despite its drag-and-drop functionality. The documentation is detailed in some areas, but lacking in others. For instance, while it was easy to find a list of APIs, it was more difficult to find basic instructions on how to use the workspace — or even figure out that there was a workspace, let alone how to use the various objects and methods.
Once you save your workspace, it’s on the public Web, although it’s unlikely that anyone else will be able to find it unless you share the URL. And I found some of the samples not all that helpful in understanding the underlying data, even if they were visually striking.
Skill level: Intermediate.
Runs on: Any Web browser.
Learn more: To get started, I’d suggest the videos «Interface Basics» (7 minutes) and «Workspaces and Code.» You can find a sample called The Pay Gap Between Men and Women Mapped at the website of British newspaper The Guardian.
Tableau Public
What it does: This tool can turn data into any number of visualizations, from simple to complex. You can drag and drop fields onto the work area and ask the software to suggest a visualization type, then customize everything from labels and tool tips to size, interactive filters and legend display.
Click to view interactive graphic.
What’s cool: Tableau Public offers a variety of ways to display interactive data. You can combine multiple connected visualizations onto a single dashboard, where one search filter can act on numerous charts, graphs and maps; underlying data tables can also be joined. And once you get the hang of how the software works, its drag-and-drop interface is considerably quicker than manually coding in JavaScript or R for most users, making it more likely that you’ll try additional scenarios with your data set. In addition, you can easily perform calculations on data within the software.
Drawbacks: In the free version of Tableau’s business intelligence software, your visualization and data must reside on Tableau’s site. Whenever you save your work, it gets sent up to the public website — which means you can’t save work in progress without running the risk that it will be seen before it’s ready (while Tableau’s site won’t deliberately expose your work, it relies on security by obscurity — so someone could see your work if they guess your URL). And once it’s saved, viewers are invited to download your entire workbook with data. Upgrading to a single-user desktop edition costs $999.
Not surprisingly, all that functionality comes at a cost: Tableau’s learning curve is fairly steep compared to that of, say, Fusion Tables. Even with the drag-and-drop interface, it’ll take more than an hour or two to learn how to use the software’s true capabilities, although you can get up and running doing simple charts and maps before too long.
Skill level: Advanced beginner to intermediate.
Runs on: Windows 7, Vista, XP, 2003, Server 2008, 2003.
Learn more: There are seven short training videos on the Tableau site, where you can also find downloadable data files that you can use to follow along.
You can see a sample in our article «Tech Unemployment Climbs; Self-employment Steady.»
Many Eyes
A pioneer in Web-based data visualization, IBM’s Many Eyes project combines graphical analysis with community, encouraging users to upload, share and discuss information. It’s extremely easy to use and very well documented, including suggestions on when to use what kind of visual data representation. Many Eyes includes more than a dozen output options — from charts, graphics and word clouds to treemaps, plots, network diagrams and some limited geographic maps.
You’ll need a free account to upload and post data, although anyone can browse. Formatting is basic: For most visualizations, the data must be in a tab-separated text file with column headers in the first row.
It took me about three minutes to create a bar chart of top H-1B visa employers.
Click to view larger image.
It took perhaps another minute to create a treemap of the same data.
Click to view larger image.
What’s cool: Visualization can’t get much easier, and the results look considerably more sophisticated than you’d expect based on the minimal amount of effort needed to create them. Plus, the list of possible visualization types includes explanations of the types of data each one is best suited for.
Drawbacks: Both your visualizations and your data sets are public on the Many Eyes site and can be easily downloaded, shared, reposted and commented upon by others. This can be great for certain types of users — especially government agencies, nonprofits, schools and other organizations that want to share visualizations on someone else’s server budget — but an obvious problem for others. (IBM does offer a contact form for businesses interested in hosting their own version of the software.) In addition, customization is limited, as is data file size (5MB).
Skill level: Beginner.
Runs on: Java and any modern Web browser that can display Flash.
Learn more: IBM’s website features pages explaining data formatting for Many Eyes andvisualization choices.
You can see some featured visualizations on the Many Eyes home page or browse through some of the tens of thousands of uploads. One interesting map shows popular surnames in the U.S. from the 2000 Census by Martin Wattenberg, one of the creators of Many Eyes.
VIDI
What it does: Although VIDI’s website bills this as a tool for the Drupal content management system, graphics created by the site’s visualization wizard can be used on any HTML page — no Drupal required.
Upload your data, select a visualization type, do a bit of customization selection, and your chart, timeline or map is ready to use via auto-generated embed code (using an iframe, not JavaScript or Flash).
Click to view interactive graphic.
What’s cool: This is about as easy as Many Eyes — with more mapping options and no need to make your visualization and data set public on its website. There are quick screencasts explaining each visualization type and several different color customization options. And the file-size limit of 30MB is six times larger than Many Eyes’ 5MB maximum.
Drawbacks: Oddly, the visualization wizard was a lot easier to use than the embed code — my embedded iframe didn’t display while trying to preview it on the VIDI website; I needed to save the visualization and go to the «My VIDI» page to get embed code that actually worked. Also, as with any cloud service, if you’re using this for Web publishing, you’ll want to feel confident that the host’s servers can handle your traffic and will be available longer than your need to display the data.
Skill level: Beginner.
Runs on: Any Web browser.
Learn more: The VIDI home page features a link to an 11-minute video tutorial.
It took me less than five minutes to create a sample: a map of earthquakes of 7.0 magnitude or more since Jan. 1, 2000.
Zoho Reports
What it does: One of the more traditional corporate-focused business analytics offerings in this group, Zoho Reports can take data from various file formats or directly from a database and turn it into charts, tables and pivot tables — formats familiar to most spreadsheet users.
What’s cool: You can schedule data imports from sources on the Web. Data can be queried using SQL and can be turned into visualizations, and the service is set up for Web publishing and sharing (although if it’s accessed by more than two users, you will need a paid account).
Click to view larger image.
Drawbacks: Visualization options are fairly basic and limited. Interacting live with the Web-based data can be sluggish at times. Data files are limited to 10MB. I found the navigation confusing at times — for example, after I saved a copy of a sample database, I was told it was in the folder «My reports,» yet I had a hard time finding that.
Skill level: Advanced beginner.
Runs on: Any Web browser.
Learn more: There are video demos and samples on Zoho’s website.
Code help: Wizards, libraries, APIs
Sometimes nothing can substitute for coding your own visualization — especially if the look and feel you’re after can’t be achieved without an existing desktop or Web app. But that doesn’t mean you need to start from scratch, thanks to a wide range of available libraries and APIs.
Choosel (under development)
What it does: This open-source Web-based framework is designed for charts, clouds, graphs, timelines and maps. Right now, it is geared more for developers who create applications than it is for end users who need to save and/or embed their work; but there’s an interactive online demothat lets you quickly upload some data to visualize.
Click to view larger image.
What’s cool: As with Tableau Public, you can have more than one visualization on a page and connect them so that, for example, mousing over items on a chart will highlight corresponding items on a map.
Drawbacks: This is not yet an application that end users can use to store and share their work. And I found the online demo to be finicky about uploading data — even after I corrected field formats for dates (dd/mm/yyyy) and location (latitude/longitude) as documented, my data wouldn’t load until I had another text field added (rather than just having numerical fields). It was also unclear how to customize labels. This project shows promise if it’s further developed and documented.
Skill level: Expert
Runs on: Chrome, Safari and Firefox.
Learn more: There’s a short video called Choosel — Timeline and Basic Features and a sample titled Earthquakes With 1,000 or More Deaths Since 1900.
Exhibit
What it does: This spin-off of the MIT Simile Project is designed to help users «easily create Web pages with advanced text search and filtering functionalities, with interactive maps, timelines and other visualization.» Billed as a publishing framework, the JavaScript library allows easy additions of filters, searches and more. The Easy Data Visualization for Journalists page offers examples of the code in use at a number of newspaper websites.
Of course, «easy» is in the eye of the beholder — what’s easy for the professionals at MIT who created Exhibit might not be that simple for a user whose comfort level stops at Excel. Like most JavaScript libraries, Exhibit requires more hand-coding than services such as Many Eyes and Google Fusion Tables. On the other hand, Exhibit has clear documentation for beginners, even those with no JavaScript experience.
What’s cool: For those who are comfortable coding, Exhibit offers a number of views — maps, charts, timeplots, calendars and more — as well as customized lenses (ways to format an individual record) and facets (properties that can be searched or sorted). You’re much more likely to get the exact presentation you want with Exhibit than, say, Many Eyes. And your data stays local unless and until you decide to publish.
Drawbacks: For newcomers unused to coding visualizations, it takes time to get familiar with coding and library syntax.
Skill level: Expert.
Learn more: There are a number of examples you can look at, including Red Sox-Yankees Winning Percentages Through the Years, U.S. Cities by Population and others.
Note: There are numerous other JavaScript libraries to help create visualizations, such as the recently released Data-Driven Documents and the jQuery Visualize plug-in. Six Revisions’ list of20 Fresh JavaScript Data Visualization Libraries gives you an idea of how many there are to choose from.
Google Chart Tools
What it does: Unlike Google Fusion Tables, which is a full-fledged, self-contained application for uploading and storing data, and generating charts and maps, Chart Tools is designed to visualize data residing elsewhere, such as your own website or within Google Docs.
Click to view larger image.
Google offers both a Chart API using a «simple URL request to a Google chart server» for creating a static image and a Visualization API that accesses a JavaScript library for creating interactive graphics. Google offers a comparison of data size, page load, skills needed and other factors to help you decide which option to use.
For the simpler static graphics, there’s a wizard to help you create a chart from some sample formats; it goes as far as helping you input data row by row, although for any decent-size data set — say, more than half a dozen or so entries — it makes more sense to format it in a text file.
The visualization API includes various types of charts, maps, tables and other options.
What’s cool: The static image chart is reasonably easy to use and features a Live Chart Playground, which allows you to tweak code and see your results in real time.
The more robust API lets you pull data in from a Google spreadsheet. You can create icons that mix text and images for visualizations, such as this weather forecast note, and what it calls a«Google-o-meter» graphic. The Visualization API also has some of the best documentation I’ve seen for a JavaScript library.
Drawbacks: The static charts tool requires a bit more work than some of the other Web-based services, and it doesn’t always offer lots of extras in return. And for the API, as with other JavaScript libraries, coding is required, making this more of a programming tool than an end-user business intelligence application.
Skill level: Advanced beginner to expert.
Runs on: Any Web browser.
Learn more: See Getting Started With Charts and Interactive Charts. There are also samples in the Google Visualization API Gallery.
JavaScript InfoVis Toolkit
What it does: InfoVis is probably not among the best known JavaScript visualization libraries, but it’s definitely worth a look if you’re interested in publishing interactive data visualizations on the Web. The White House agrees: InfoVis was used to create the Obama administration’sInteractive Budget graphic.
What sets this tool apart from many others is the highly polished graphics it creates from just basic code samples. InfoVis creator Nicolas García Belmonte, senior software architect at Sencha Inc., clearly cares as much about aesthetic design as he does about the code, and it shows.
What’s cool: The samples are gorgeous and there’s no extra coding involved to get nifty fly-in effects. You can choose to download code for only the visualization types you want to use to minimize the weight of Web pages.
Drawbacks: Since this is not an application but a code library, you must have coding expertise in order to use it. Therefore, this might not be a good fit for users in an organization who analyze data but don’t know how to program. Also, the choice of visualization types is somewhat limited. Moreover, the data should be in JSON format.
Skill level: Expert.
Runs on: JavaScript-enabled Web browsers.
Learn more: See demos with source code.
Protovis
What it does: Billed as a «graphical toolkit for visualization,» this project from Stanford University’s Visualization Group is one of the more popular JavaScript libraries for turning data into visuals; it’s designed to balance simplicity with control over the display.
What’s cool: One of the best things about Protovis is how well it’s documented, with plenty of examples featuring visualization and sample code. There are also a large number of sample visualization types available, including maps and some statistical analyses. This is a robust tool, capable of building graphics like this color-coded U.S. map with timeline slider.
Drawbacks: As is the case with other JavaScript libraries, it’s pretty much essential for users to have knowledge of JavaScript (or at least some other programming language). While it’s possible to copy, paste and modify code without really understanding what it’s doing, I find it difficult to recommend that approach for nontechnical end users.
Skill level: Expert.
Runs on: JavaScript-enabled Web browsers.
Learn more: Try the How-to: Get Started Guide. You can also find examples of the types of graphics you can build with Protovis at the Protovis Gallery.
GIS/mapping on the desktop
There’s a wide range of business uses for geographic information systems (GIS), ranging from oil exploration to choosing sites for new retail stores. Or, as The Miami Herald did for its Pulitzer Prize-winning coverage of Hurricane Andrew, you can compare maximum wind speeds with damage reports and building information (and perhaps discover, for example, that the worst damage didn’t happen in the areas suffering the heaviest winds, but in areas with a lot of new, shoddy construction).
Quantum GIS (QGIS)
What it does: This is full-fledged GIS software, designed for creating maps that offer sophisticated, detailed data-based analysis of a geographic regions.
The best-known desktop GIS software is probably Esri’s ArcView, a robust, well-supported application that costs quite a bit of money. The open-source QGIS is an alternative to ArcView.
Click to view larger image.
As OpenOffice is to Microsoft Office, QGIS is to ArcView. ArcView enthusiasts argue that Esri’s offering is a couple of years ahead of open-source alternatives, has a better-developed interface, enjoys commercial support and is better suited for print output. But QGIS users say the open-source alternative is an excellent program that does a great deal of useful GIS work — and may even be better than ArcView when it comes to generating maps for the Web, thanks to a plug-in dedicated to generating HTML image maps.
What’s cool: QGIS has an enormous amount of GIS functionality, including the ability to create maps, overlay various types of data, do spatial analysis, publish to the Web and more. It can also be enhanced with plug-ins that add support for numerous undertakings, including geocoding, managing underlying table data, exporting to MySQL and generating HTML image maps.
Drawbacks: As with any sophisticated GIS application, learning to use this software entails a serious commitment of time and training. Even in hour-long hands-on sessions with first ArcView and then QGIS, I noticed things that were easier to do in the commercial option. For example, ArcView had a one-click «normalize» function to immediately calculate, say, the percentage of people 65 and over versus the total population from a data table with both columns; in QGIS, I needed to pull up a «field calculator» and create a new column with the formula to do that calculation myself.
Runs on: Linux, Unix, Mac OS X, Windows. (This is one case where installation is more complicated on OS X, since it requires manual installation of several dependencies. There’s a one-click installer for Windows.)
Skill level: Intermediate to expert.
Learn more: Timothy Barmann of The Providence Journal posted two very useful tutorials for the CAR conference that are still available: Introduction to QGIS and The Latest in Mapping With JavaScript and jQuery. Barmann also offers a sample: Rhode Island’s Ethnic Mosaic. Another resource to help you get started: QGIS Tutorial Labs from Richard E. Plant, professor emeritus at the University of California, Davis.
Note: If you’re interested in GIS and want to consider other free software options, download this PDF listing of Open Source/Non-Commercial GIS Products. And if you’re looking for a free open-source desktop GIS program that might be fairly easy to use, Jacob Fenton, director of computer-assisted reporting at American University’s Investigative Reporting Workshop, recommends taking a look at the System for Automated Geoscientific Analyses (SAGA) site. Finally, if analyzing geographic data in a conventional database sounds interesting, PostGIS«spatially enables» the PostgreSQL relational database, according to the site.
Web-based GIS/mapping
Most of us are familiar with mapping tools from major companies like Google (which has a number of third-party front ends such as Map A List, an add-on that adds info to a Google Map from a spreadsheet). There’s also Yahoo Maps Web Services and Bing Maps — all with APIs. But there are numerous other options from smaller organizations or lone open-source enthusiasts that were designed from the ground up to map geographic data.
OpenHeatMap
What it does: This user-friendly website generates color-coded maps; the colors change depending on underlying info such as population change or average income. It can also place markers on a map, varying the size of the markers based on a data table.
In addition to providing the Web-based service, author Pete Warden has also packaged OpenHeatMap as a jQuery plug-in for those who don’t want to rely on hosting at OpenHeatMap.com. However, not all data formats work correctly when hosted locally. «My recommended way is to embed the maps from the site,» Warden wrote via Skype chat.
What’s cool: It is astonishingly easy to create a color-coded map from many types of location data — even IP addresses (just use the column header ip_address).
It took me about 60 seconds to create a basic map from a spreadsheet of magnitude 7 or higher earthquakes around the world since Jan. 1, 2000, then a couple of minutes more to customize the rollover box to display both date and magnitude. (You can see a larger version on OpenHeatMap.com.)
Marker transparency, size and color are extremely simple to customize; you can also upload your own marker image, and customize what appears in the tooltips rollover by adding a tooltip column to your data source.
OpenHeatMap automatically figures out and maps locations based on a wide range of place definitions, relying on how the location columns are named — «address,» «country,» «fips_code» (used by the U.S. Census Bureau), «zip_code_area» (for five-digit ZIP codes), «lat» (latitude), «lon» (longitude) and so on.
This is a well-thought-out interface from a onetime Apple engineer. (Warden said he worked on several software projects at Apple, including Final Cut Studio.)
Drawbacks: There’s no way to delete data once it’s been uploaded (you can get around this by using a Google Spreadsheet as a data source), and editing time is limited to as long as your browser is open and you haven’t started a new map. Embedded OpenHeatMap.com-hosted maps may be slow to load.
The documentation doesn’t make it clear whether you can set where the map is centered or what the default zoom level should be; Warden told me by e-mail that the system remembers where you last positioned and zoomed the map before saving. And this feature still can occasionally be buggy, although Warden is responsive to bug reports.
Skill level: Beginner.
Runs on: Web browsers enabled for Flash or HTML 5 Canvas.
Learn more: Its title notwithstanding, the four-minute video «How OpenHeatMap Can Help Journalists» offers a clear explanation for anyone interested in using the service. You can also view samples on the OpenHeatMap Gallery and check out this Guardian interactive map of where Facebook is used.
OpenLayers
What it does: OpenLayers is a JavaScript library for displaying map information. It’s aimed at providing functionality similar to those big companies’ code libraries — but with open-source code. OpenLayers works with OpenStreetMap and other maps, as this tutorial about use with Google shows.
Other projects build on it to add functionality or ease of use, such as GeoExt, which adds more GIS capabilities. For users who are comfortable hand-coding JavaScript and prefer not to use a commercial platform such as Google or Bing, this can be a compelling option.
Drawbacks: OpenLayers is not yet as developed or as easy to use as, say, Google Maps. The project page notes that it is «still undergoing rapid development.»
Skill level: Expert.
Runs on: Any Web browser.
Learn more: Try this OpenLayers Simple Example. A good sample is Ushahidi’s Haiti map.
There are other JavaScript libraries for overlaying information on maps, such as Polymaps. And there are a number of other mapping platforms, such as Google Maps, which offers numerous mapping APIs; Yahoo Maps Web Services, with its own APIs; the Bing Maps platform and APIs; and GeoCommons.
OpenStreetMap
What it does: OpenStreetMap is somewhat like the Wikipedia of the mapping world, with various features such as roads and buildings contributed by users worldwide.
What’s cool: The main attraction of OpenStreetMap is its community nature, which has led to a number of interesting uses. For example, it is compatible with the Ushahidi mobile platform used to crowdsource information after the earthquakes in Haiti and Japan. (While Ushahidi can use several different providers for the base map layer, including Google and Yahoo, some project creators feel most comfortable sticking with an open-source option.)
Drawbacks: As with any project accepting public input, there can be issues with contributors’ accuracy at times (such as the helicopter landing pad someone once placed in my neighborhood — it’s actually quite a few miles away). Although, to be fair, I’ve encountered more than one business listing on Google Maps that was woefully out of date. In addition, the general look and feel of the maps isn’t quite as polished as commercial alternatives.
Skill level: Advanced beginner to intermediate.
Runs on: Any Web browser.
Learn more: See the Quick Tutorial on the OpenLayers site.
Temporal data analysis
If time is an important component of your data, traditional timeline visualizations may show patterns, but they don’t allow for sophisticated analysis or a great deal of interaction. That’s where this project comes in.
TimeFlow
What it does: This desktop software is for analyzing data points that involve a time component. In a demo I wrote about last summer, creators Fernanda Viégas and Martin Wattenberg — the pair behind the Many Eyes project who are now working at Google — showed how TimeFlow can generate visual timelines from text files, with entries color- and size-coded for easy pattern spotting. It also allows the information to be sorted and filtered, and it gives some statistical summaries of the data.
Click to view larger image.
What’s cool: TimeFlow makes it incredibly easy to interact with data in various ways, such as switching views or filtering by criteria such as date ranges or earthquakes of magnitude 8 or more. The timeline view offers a slider so you can zero in on a time period. While many applications can plot bar graphs, fewer also offer calendar views. And unlike Web-based Google Fusion Tables, TimeFlow is a desktop application that makes it quick and painless to edit individual entries.
Drawbacks: This is an alpha release designed to help individual reporters doing investigative work. There are no facilities for publishing or sharing results other than taking a screen snapshot, and additional development appears unlikely in the near future.
Skill level: Beginner.
Runs on: Desktop systems running Java 1.6, including Windows and Mac OS X.
Learn more: Check out Top tips.
Note: If you’re looking to publish visualized timelines, better options include Google Fusion Tables, VIDI or the SIMILE Timeline widget.
Text/word clouds
Some data visualization geeks think word clouds are either not very serious or not very original. You can think of them as the tiramisu of visualizations — once trendy, now overused. But I still enjoy these graphics that display each word from a text file once, with the size of the words varying depending on how often each one appears in the source.
IBM Word-Cloud Generator
What it does: Several tools mentioned previously can create word clouds, including Many Eyes and the Google Visualization API, as well as the website Wordle (which is a handy tool for making word clouds from websites instead of text files). But if you’re looking for easy desktop software dedicated to the task, IBM’s free Word-Cloud desktop application fits the bill.
What’s cool: This is a quick, fun and easy way to find frequency of words in text.
Drawbacks: Because it’s trying to ignore words such as «a» and «the,» the basic configuration can miss some important terms. In my tests, it didn’t know the difference between «it» and «IT,» and completely missed «AT&T.»
Skill level: Advanced beginner. This app runs on the command line, so users should have ability to find file paths and plug them into a sample command.
Runs on: Windows, Mac OS X and Linux running Java.
Learn more: Check the examples that come with the download.
Social and other network analysis
These tools use a pre-Facebook/Twitter definition of «social network analysis» (SNA), referring to the discipline of finding connections between people based on various data sets. Investigative journalists have used such tools to, for example, find links between people who are involved in development projects or who are members of various boards of directors.
An understanding of statistical theories of network node analysis is necessary in order to use this category of software. Since I’ve only had a very basic introduction to that discipline, this is one category of tools I did not test hands-on. But if you’re seeking software to do such analysis, one of these might meet your needs.
Gephi
What it does: Billed as a Photoshop for data, this open-source beta project is designed for visualizing statistical information, including relationships within networks of up to 50,000 nodes and half a million edges (connections or relationships) as well as network analyses of factors such as «betweenness,» closeness and clustering coefficient.
Click to view larger image.
Runs on: Windows, Linux, Mac OS X running Java 1.6.
Learn more: Try this Quick Start tutorial (PDF).
NodeXL
What it does: This Excel plug-in displays network graphs from a given list of connections, helping you analyze and see patterns and relationships in the data.
NodeXL merges the older and current definitions of SNA. It’s «optimized for analyzing online social media — it includes built-in connections to query the APIs of Twitter, Flickr and YouTube, allowing you to draw networks of users and their activity,» according to Peter Aldhous, San Francisco bureau chief for New Scientist magazine.
It also handles e-mail and conventional network analysis files (including data created by the popular — but not free — analysis tool UCINET).
Runs on: Excel 2007 and 2010 on Windows.
Learn more: Download this detailed free NodeXL tutorial (PDF) or these basic step-by-step instructions on analyzing your own Facebook social network (PDF). One Facebook app for downloading your own friend information for use in NodeXL is Name Gen Web.
Sharon Machlis is online managing editor at Computerworld. Her email address issmachlis@computerworld.com. You can follow her on Twitter @sharon000, on Facebook or by subscribing to her RSS feeds:
articles | blogs .