Data URI scheme

The Data URI Scheme is a method of including (potentially external) data in-line in a web page or resource.

For example, the usual method of referencing an image (which is almost always separate to the page you’ve loaded) would the one schemes of either html:

[html]<img src="/assets/images/core/flagsprite.png" alt="flags" />[/html]

or css:

[css]background:url(/assets/images/core/flagsprite.png)[/css]

However, this remote image (or other resource) can be base64 encoded and included directly into the html or css using the data uri schema:

[html]<img src="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
vr4MkhoXe0rZigAAAABJRU5ErkJggg==">[/html]

or

[css]background:url(data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQAQMAAAAlPW0iAAAABlBMVEUAAAD/
//+l2Z/dAAAAM0lEQVR4nGP4/5/h/1+G/58ZDrAz3D/McH8yw83NDDeNGe4U
g9C9zwz3gVLMDA/A6P9/AFGGFyjOXZtQAAAAAElFTkSuQmCC[/css]

So, if you fancy cutting down on the number of HTTP requests required to load a page whilst massively increasing the size of your css and html downloads, then why not look into the data uri scheme to actually include images in your css/htm files instead of referencing them?!

Sounds crazy, but it just might work.

Using the code below you can recursively traverse a directory for css files with “url(“ image references in them, download the images, encode them, and inject the encoded image back into the css file. The idea is that this little proof of concept will allow you to see the difference in http requests versus full page download size between referencing multiple external resources (normal) and referencing fewer, bigger resources (data uri).

Have a play, why don’t you:

[csharp highlight=”72,73,75″]using System;
using System.IO;
using System.Text.RegularExpressions;
using System.Net;

namespace Data_URI
{
class Data_URI
{
static void Main(string[] args)
{
try
{
var rootPath = @"D:\WebSite\";

// css file specific stuff
var cssExt = "*.css";
// RegEx "url(….)"
var cssPattern = @"url\(([a-zA-Z0-9_.\:/]*)\)";
// new structure to replace "url(…)" with
var cssReplacement = "url(data:{0};base64,{1})";

// recursively get all files matching the extension specified
foreach (var file in Directory.GetFiles(rootPath, cssExt, SearchOption.AllDirectories))
{
Console.WriteLine(file + " injecting");

// read the file
var contents = File.ReadAllText(file);

// get the new content (with injected images)
// match css referenced images: "url(/blah/blah.jpg);"
var newContents = GetAssetDataURI(contents, cssPattern, cssReplacement);

// overwrite file if it’s changed
if (newContents != contents)
{
File.WriteAllText(file, newContents);
Console.WriteLine(file + " injected");
}
else
{
Console.WriteLine(file + " no injecting required");
}
}

Console.WriteLine("** DONE **");
Console.ReadKey();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
Console.ReadKey();
}
}

static string GetAssetDataURI(string fileContents, string pattern, string replacement)
{
try
{
// pattern matching fun
return Regex.Replace(fileContents, pattern, new MatchEvaluator(delegate(Match match)
{
string assetUrl = match.Groups[1].ToString();

// check for relative paths
if (assetUrl.IndexOf("http://") < 0)
assetUrl = "http://mywebroot.example.com" + assetUrl;

// get the image, encode, build the new css content
var client = new WebClient();
var base64Asset = Convert.ToBase64String(client.DownloadData(assetUrl));
var contentType = client.ResponseHeaders["content-type"];

return String.Format(replacement, contentType, base64Asset);
}));
}
catch (Exception)
{
Console.WriteLine("Error"); //usually a 404 for a badly referenced image
return fileContents;
}
}
}
}[/csharp]

The key lines are highlighted: they download the referenced resource, convert it to a byte array, encode that as base64, and generate the new css.

This practise probably isn’t very useful for swapping out img refs  in HTML since you lose out on browser caching and static assets cached in CDNs. It may be more useful for images referenced in CSS files, since they’re static files themselves which can be minified, pushed to CDNs, and take advantage of browser caching.

Comments welcomed.