By Justin Strong
Numonyx Software Marketing Manager
When deciding what flash memory technology is appropriate for a particular application, many designers use the datasheet as their main reference for making this decision. The logic is that if flash device A has a write speed of 3 MB/s and flash device B has a write speed of 1.5 MB/s, then device A obviously has the better performance and my system will run faster using device A. If evaluating performance were only so simple! There are many different aspects of performance, and while datasheet specifications are useful, they are not enough. This article explores the steps that should be taken to effectively use system-level benchmarking when evaluating the performance of flash memory devices.
While datasheet specifications have their place, they do not show how well a device will perform on an application level, which is what is really required when evaluating performance. As an example, let us assume we are evaluating the performance of flash devices A and B. Flash device A has much better performance than device B when writing large data chunks. Flash device B has much better performance than flash device B for small data and random writes. Which device has better performance depends on the usage scenarios. If the application is mostly writing large chunks of data, such as a data card in an MP3 player, then most likely device A is the better choice. If the application performs a significant number of small data writes, flash device B may be the better choice.
Another consideration is the software stack that is being used with the flash device. The software stack can have a major impact on the efficiency of any operation involving flash devices. The throughput of a flash device at the application layer can often translate to less than 50 percent of the datasheet specification, depending on the software and chipset.

Figure 1: Application stack
The following steps outline a process for optimizing flash performance in a design. Following all of the steps should result in optimal performance for the desired application. If not all of the information is available, some steps may be skipped. While this may not produce optimal results, there still should be a positive impact on performance.
- Determine what applications are important for performance.
- Document how users are using the applications (use cases).
- Define tests that benchmark the use cases.
- Characterize operations in the application stack.
- Improve the application stack for the use cases.
The rest of this article documents each of these steps.
In this step, identify which applications have performance requirements and which applications have the highest priority performance requirements. In addition to understanding which requirements are high priorities, it is equally important to determine what performance is expected and what is acceptable to the user. Not all applications have significant performance requirements. Ideally, the performance of all applications should be measured. However, practically speaking, resource limitations usually dictate the need to focus on key applications.
When certain applications have been identified for performance evaluation, it is critical to identify how users are interacting with the application by answering the following questions:
- What are the typical ways users interact with the application?
- What operations are executed the most?
- What operations are the most important for performance?
- What operations have performance issues?
Once use cases are identified, tests to measure performance of these use cases must be defined. This step is relatively easy if the use cases in the previous step are well defined. If use cases are not available, the benchmark tests are defined based on an understanding of how users may interact with the application and what key performance parameters will affect the user.
Characterize operations in the application stack
After performance benchmarking has been performed, an assessment is made to determine whether or not performance is sufficient for each application. If performance is sufficient, the process is complete until the next development cycle. If performance is not sufficient, some level of optimization must be done. The first step is to determine the source of performance bottlenecks. This requires the ability to characterize operations of each layer in the application stack and the interactions between layers. The application stack in this context includes the software and flash devices in the design. Understanding the interactions between layers is important because one layer may be performing poorly because of the size or format of information it is getting from another layer.
Characterizing operations in a layer means understanding the frequency, data size and time measurements associated with the operation on a single layer. Without this understanding, it is difficult to determine how to improve a particular piece of software. It is also important to understand the time contribution of a particular layer for a given operation. For example, when writing an MP3 to a flash device, it is important to understand how much is being spent in the driver software, file system and the application to determine if there are any bottlenecks.
The flash memory devices must be considered when characterizing operations in the application stack. Different flash devices and technologies perform differently in different situations. Poor performance may be due to the way the software is interacting with the flash device. In certain cases, it may be beneficial to change the flash device to a different technology to improve performance. Of course, performance testing must be done early enough in the design cycle so to allow changes in flash technology, but sometimes this may be required to get the desired performance.
Once all of the previous steps have been completed, developers can focus on optimizing the application stack to address the performance problem. Without this information, developers may be focusing on the wrong problem and never determine the real performance issue. The first steps are the most important because they identify which performance issues are most important to end users. Without this information, a large amount of effort may be wasted by focusing on performance issues that the end user does not care about.
For this example, we will consider measuring flash performance on a high-end cell phone that uses single-level cell (SLC) NAND for the storage of code and data. Since a cell phone typically has many different applications, it is critical that the most important applications are identified for performance optimization.
For purposes of this example, we will assume that the web browser has been identified as the most important application for performance. The next step is to identify what browser operations are important to an end user. In our example, browser initialization time and page loading time of complex web pages are the most use important performance measurements.
In this example, it is important to determine what parameters affecting performance should be considered. For example, should network response time be included in the measurements? This is a measurement an OEM has no control over. There are several ways to deal with excluding network response time, including using a Wi-Fi connection or loading pages over a local network only.
With the measurements identified, the application stack can now be characterized. Web pages typically have a large amount of small data items to download, such as images, logos, html components and cookies. Characterization shows that the browser writes mainly small data items that are less than 16K in size. The browser makes calls to the file system to write these items and reads a significant number of these items from the browser cache. The file system is a FAT file system that uses a Flash Translation Layer (FTL) to manage the NAND-flash-specific operations, including sector reads/writes, bad block management, Wear Leveling and Power Loss Recovery (PLR). Small data writes are inefficient for the FAT file system, FTL layer and the NAND device, which causes a performance bottleneck in loading pages.
The bottleneck has been identified and the development team can decide how to address the issue. There may be several different options. Some examples of options may be:
- Replace the FAT file system with a more efficient file system.
- Buffer small data writes into a smaller number of larger writes.
- Use partial page writes in the NAND device.
In this article we have identified the importance of using system-level performance benchmarking when evaluating the performance of flash devices. Understanding which applications will be using the flash device and the use model of those applications is as important as understanding the low-level datasheet specifications of performance. Using these use models to characterize performance is the best way to identify performance bottlenecks and ultimately achieve the best performance from your system.