Building Scalable Ecommerce Solutions Part II

Posted in .NET Development,Web Development,eCommerce by Bridgeline Digital on March 14th, 2009

Server photoBased on the number of inquiries from my last post (Building Scalable Ecommerce Solutions) I decided to follow-up with a more in-depth discussion on one of the topics which I have touched on previously. In this post I will concentrate on Data Caching.

In my opinion Data Caching is the single most beneficial strategy a System Engineer or a Developer can implement in order to achieve scalability. In this article I will discuss a few guidelines for implementing an effective Data Caching strategy.

Data Caching is a method by which data is stored and retrieved from transient memory. Transient memory, distinguished from persistent memory, is storage that is normally short-lived and is reset or removed when the server or the computer on which it resides is rebooted or when the storage is replaced with other transient data. Transient memory is the fastest medium used to retrieve from and store into modern computers.

We often refer to RAM (Random Access Memory) as transient memory. When we work with data driven software (whether a Word Document, a Spreadsheet or a Dot Net Object) we often place it in transient memory in order to manipulate it. As mentioned, data access to and from RAM is extremely fast.  However, without saving it to disk (persistent storage) we run the risk of losing all our data upon reboot or an anomaly in the software.

Accessing data from persistent storage such as File System or Relational Database System (RDBMS, SQL Server, for example) is much slower but safer.  To underscore the difference between the two memory models with a real-life scenario think about how long it takes to retrieve a large MS Word document in comparison to typing content into it when it was already retrieved. Imagine what would happen if you needed to save and re-retrieve your document everytime you typed in a character. It would undoubtedly be extremely slow and inefficient!

A good candidate for cached data in a software application is ‘read-only’ data that doesn’t change very often. In a typical e-commerce scenario this can be a product name and description, category hierarchy, images, or promotional marketing text etc. The cached data can still originate from the database but it doesn’t require a database transaction every time it is accessed.  Modern Database Engines provide their own internal data caching as queries that are repeatedly called to retrieve the same data maintain their last unmodified result in transient memory. In my opinion, it sort of a way for the database engine to compensate for bad programming behavior. In a perfect world, the logic of when data should be cached should truly be the developer’s or the programmer analyst’s responsibilities.

So, how does the cache really work? Cache data is built to support multiple threads accessing its data simultaneously.  Typically, it has a multi-reads (or Unblocked Read) mechanism and a single write. In other words it allows multiple threads (web connection or web browsers) to read from the cached data simultaneously however, when a write occurs (updated data populates the cache) all the ‘reads’ are blocked waiting for the ‘write’ to complete. Using blocking techniques such as Critical Section, Mutex or Semaphores, allows multiple threads to work in harmony as many can read but only one can write at any given time. Most common computing languages provide high level frameworks for multi-threading implementation. However, these tasks should normally be given to the more experienced and responsible software professionals. Multi threaded programming is not a walk in the park and if coded incorrectly could result in bugs that are very hard to isolate and fix.

So what are we gaining here? We are gaining efficiency in database connections and in the number of database queries. To clarify, by using data caching we will normally have one DB connection interfacing with the database and populating the cache. All threads, clients accessing data, communicate with the cache and do not use any DB connections.  Without having the cache in place, each thread would have repeatedly read and write from and to the database. A large number of DB Connections will negatively affect performance and as the number of concurrent users grows the problem will worsen exponentially.

It is obvious that if your cached data is modified frequently, you will lose its effectiveness since the write block will hold up all the reads until it is finished. One more advantage to keeping data in the cache is that you are able to keep business objects in their natural structure vs. converting objects from natural (complex hierarchical structure) to DB format (columns and rows) and back from DB to natural. These conversions are extremely expensive in time and CPU resources. By keeping your objects in their natural format in the cache and segregating your data into well defined structures you will be able to gain even further locking granularity in which a modification to an Object only locks that Object and not an entire collection of Objects. Similar to database locking by which updating a single row only locks the row being updated and not the entire table.

Business Objects are normally stored in a cached dictionary.  You can get to your Business Object by using a key in a dictionary. Reaching your object via key access is very fast as it uses advance binary search optimized to your dictionary size. This is all handled by the Dot Net Framework. Once you got to your business Object you can access attributes and methods by referencing the “Dot” notation of ”Object.Attribute” as you would normally do in C# or VB Dot Net.

There are plenty of techniques which will allow you to unleash the power of your cache but it is important to plan well, and only allow experienced developers to handle the cache implementation. Cache coding requires special attention to thread synchronization and locking techniques.

With the right data caching mechanism in place you will give your application the scalability it needs to meet your growing demand.

Written by Erez Katz

2 Responses to “Building Scalable Ecommerce Solutions Part II”

  1. Jordan McMillan

    Excellent post.
    I have been following your articles.
    Can’t wait to read the next one.

  2. Cart Man

    You guys have some great articles

Add a Comment

Submit