Saturday, February 13, 2010

Squeezing out performance from Charting

After my review of the charting products available currently I decided to go with the Slverlight/WPF Data Visualization project (from the WPF Toolkit). I did end up coming up with a trick to squeeze some performance out of the charts. Quite simply that charts don’t handle a lot of data very well. When you have rich data templates, animation and more than several hundred data points, performance is pretty poor. I decided that the first thing to compromise on would be animation. It is nice but I would rather speed. By turning off the animation in the charting I get a little speed up but there still is a simple truth that must be considered. If my graph is only 500px wide then why try to render more than 500 data points? This is my first step to gaining some performance; filter out data by sampling so we never try to get the chart to render more data than it ever could.

I can achieve this by creating a custom CollectionViewSource. The new sub class is simple;

  • it has a dependency property of MaxItemCount
  • on any change to the MaxItemCount or the Source we sample the data set, and save the values we want to display into a set
  • we subscribe to the Filter event and only accept an item if it is included in our sample set
public class CollectionSizeFilter : CollectionViewSource
{
    int _count;
    ICollectionView _defaultView;
    HashSet<object> _toKeep;

    public CollectionSizeFilter()
    {
        Filter += CollectionSizeFilter_Filter;
    }

    protected virtual void CollectionSizeFilter_Filter(object sender, FilterEventArgs e)
    {
        e.Accepted = _toKeep == null || _toKeep.Contains(e.Item);
    }

    protected override void OnSourceChanged(object oldSource, object newSource)
    {
        base.OnSourceChanged(oldSource, newSource);
        _defaultView = GetDefaultView(newSource);
        _count = Count(_defaultView.SourceCollection);

        LoadHashset();
    }

    public double MaxItemCount
    {
        get { return (double)GetValue(MaxItemCountProperty); }
        set { SetValue(MaxItemCountProperty, value); }
    }
    public static readonly DependencyProperty MaxItemCountProperty = DependencyProperty.Register("MaxItemCount", typeof(double), typeof(CollectionSizeFilter), new UIPropertyMetadata(1d, MaxItemCountProperty_Changed));

    private static void MaxItemCountProperty_Changed(DependencyObject sender, DependencyPropertyChangedEventArgs e)
    {
        var self = (CollectionSizeFilter)sender;
        self.LoadHashset();
    }

    private void LoadHashset()
    {
        if (_count <= MaxItemCount)
        {
            _toKeep = null;
        }
        else
        {
            _toKeep = new HashSet<object>();
            var gap = MaxItemCount - 1;
            var spacing = _count / gap;
            double nextIndex = 0d;
            int i = 0;
            foreach (var item in _defaultView.SourceCollection)
            {
                if (i >= nextIndex)
                {
                    _toKeep.Add(item);
                    nextIndex += spacing;
                }
                i++;
            }
        }
        if (View != null)
            View.Refresh();
    }

    private static int Count(IEnumerable source)
    {
        if (source == null)
        {
            return 0;
        }
        var is2 = source as ICollection;
        if (is2 != null)
        {
            return is2.Count;
        }
        int num = 0;
        IEnumerator enumerator = source.GetEnumerator();
        {
            while (enumerator.MoveNext())
            {
                num++;
            }
        }
        return num;
    }
}

To use the new “control” we just treat it like a normal CollectionViewSource but we specifiy the MaxItemCount by binding it to the width of the Chart like this.

<Controls:CollectionSizeFilter x:Key="FilteredData" 
    Source="{Binding MyData}" 
    MaxItemCount="{Binding ElementName=chart1, Path=ActualWidth}"/>

This gave some performance improvements but I thought why not follow this concept down the path a little bit more. For most data that  I want to display I am mainly interested in the trend not the minutia. So why do I try to show a data point on every pixel? I could just sample the data; for example if I have a data set of 1500 data points and a graph that is 500px wide, I get some performance gains by reducing the rendered data set to 500 data points, but why not just show one data point every say 10px? If this is acceptable for your data then you can reduce your rendered data set from 1500 down to 50 (30 times less data to render). To do this is even more simple than the code above. We just need to create and implementation of IValueConverter to do some division, a DivisionConverter.

public sealed class DivisionConverter : IValueConverter
{
    #region IValueConverter Members
    public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
    {
        double numerator = ConvertToDouble(value, culture);
        double denominator = ConvertToDouble(parameter, culture);
        return numerator / denominator;
    }

    public object ConvertBack(object value, Type targetType, object parameter, CultureInfo culture)
    {
        throw new NotImplementedException();
    }
    #endregion

    private static double ConvertToDouble(object value, IFormatProvider culture)
    {
        var result = default(double);
        try
        {
            var source = value as IConvertible;
            if (source != null)
                result = source.ToDouble(culture);
        }
        catch
        {
        }
        return result;
    }
}

So now we can just extend our previous XAML to include the new converter and apply our sample size of 10.

<Controls:DivisionConverter x:Key="divisionConverter" />
<Controls:CollectionSizeFilter x:Key="FilteredData" 
    Source="{Binding MyData}"
    MaxItemCount="{Binding ElementName=chart1, Path=ActualWidth, 
        Converter={StaticResource divisionConverter}, 
        ConverterParameter=10}"/>

Putting it all together I end up with code like below. I prefer to keep my concerns near each other, so here I have defined the filters in the LineSeries Resources instead of where some may put them which is the top of the file. I would however define the key to the DivisionConverter at the top of the file or even in the App.xaml as I will probably use it in many places. Note how I use RelativeSource Binding to find the width of the parent chart.

<chartingToolkit:Chart Name="chart1">
    <chartingToolkit:LineSeries 
        Title="{Binding Title}"
        DependentValuePath="Value"
        IndependentValuePath="Date">
        <chartingToolkit:LineSeries.Resources>
            <Controls:CollectionSizeFilter 
                x:Key="FilteredBalances" 
                Source="{Binding Balances}"
                MaxItemCount="{Binding 
                        RelativeSource={RelativeSource FindAncestor, 
                        AncestorType={x:Type chartingToolkit:Chart}}, 
                        Path=ActualWidth,
                        Converter={StaticResource DivisionConverter}, 
                        ConverterParameter=10}"/>
        </chartingToolkit:LineSeries.Resources>
        <chartingToolkit:LineSeries.ItemsSource >
            <Binding Source="{StaticResource FilteredBalances}"/>
        </chartingToolkit:LineSeries.ItemsSource>
    </chartingToolkit:LineSeries>
</chartingToolkit:Chart>

You may find your millage varies with the sampling size. You may be only comfortable with small values like 3-5 or you may be more aggressive with values around 30. It is your data and you will know what is best for you. The real beauty of this is that problem of performance is a presentation problem. With some simple controls we are able to tame the problem in the presentation layer without having to compromise the purity of our ViewModels (like setting max size values that get sent to databases). Here if we resize the chart, we already have all the data so we just render more data points. Also the solution is very generic, there are no dependencies on the WPF Data Visualization assemblies so you may find other uses for them.

2 comments:

Delay said...

Great stuff, Lee! I've just added a link to the comments of my Charting links post (http://cesso.org/r/DVLinks) and will move it up into the body next time I update that content.

My one comment here is that sometimes it may be even more useful to do something a little more sophisticated than blindly sampling the original data. The classic example here is a brief spike in the data - blind sampling may miss the spike, but that spike may well be the most interesting part of the data. :)

Thanks again - keep up the great work!

Delay said...

Maybe also have a look at this comment I just got for an example of an algorithm that performs the "smarter sampling" I mention:

http://blogs.msdn.com/delay/archive/2010/01/13/i-feel-the-need-the-need-for-speed-seven-simple-performance-boosting-tweaks-for-common-silverlight-wpf-charting-scenarios.aspx#9963339