 |
Home | Changes | Index | Search | Go
(DRAFT) Wonderland Client Asset Caching and Database
This document describes the implementation architecture and design of the client-side caching of
assets and the database that stores entries in the cache. The asset cache is simply a collection of
files in a directory hierarchy on a user's local disk. Each file in the cache has a corresponding entry
in an embedded Java DB (aka Apache Derby) database, also found on a user's local disk.
Cache and Database Location
The location of the asset cache and database resides on a user's local disk, typically in a user's
home directory, and configured via the ${wonderland.user.dir} (or some such) property. Beneath
this directory, the asset cache and database define the following directory structure:
v2/
|---------- AssetDB/
|---------- cache/
where the v2/ directory reflects the version of the asset cache and database implementation. This
additional directory structure was added in this release so that previous or future asset caches
may exist concurrently and not interfere with this version's asset cache. The version number is
defined within the asset cache source code.
Also, in this release, the ${wonderland.cache.dir} and ${wonderland.derby.dir} configuration parameters
are ignored.
Asset Database
Every asset in the cache has a corresponding entry in the asset database. The asset database is
simply an embedded Java DB (aka Apache Derby) database that is created by the Wonderland client
if it does not yet exist. It is placed within the AssetDB/ directory. To check whether an asset exists in
the cache, the Wonderland client first queries the asset database. When the Wonderland client adds
an asset to the cache, it also adds a corresponding entry in the database; when the asset is later
removed, its database entry is also removed.
The asset database consists of one table. Its structure is as follows (Table 1):
Table 1. The columns in the asset database table APP.ASSET
| Column Name | Data Type | Description | Properties |
| ASSET_URI | String (8192 max) | The URI that describes the asset. The format of this URI is described below. | Non-null primary key |
| CHECKSUM | String (40 max) | A string encoding of the asset checksum. | Non-null primary key |
| URL | String (8192 max) | The base URL of the repository from which the asset was obtained. | |
| TYPE | String (10 max) | The asset type: IMAGE, MODEL, FILE, OTHER. | |
| LAST_ACCESSED | BigInt (Long) | The time (in milliseconds since the epoch) the asset was last added, accessed, or updated. | |
| SIZE | BigInt (Long) | The size (in bytes) of the asset in the cache. | |
Each entry in the database has two primary keys: the ASSET_URI and the CHECKSUM. Assets in the cache, therefore, are
uniquely identified by an (ASSET_URI, CHECKSUM) pair. This allows the cache to store different versions of the same asset
at the same time: each version of the asset may have the same ASSET_URI, but a different CHECKSUM.
Asset URI
Every asset is defined by a URI that describes where the asset comes from. In Wonderland v0.5, assets may belong to modules
that are installed on a Wonderland server -- these assets may be served by a number of asset repositories located over the
Internet. The identity of an asset is tied to its module, and not the asset server from which it was downloaded--even though a
Wonderland client may download an asset from one of a number of different asset servers, it is still the "same" asset.
The format of the URI describing an asset belonging to a module is:
wlm://<module name>/<asset path>
where <module name> is the name of the module and the <asset path> is the relative path of the asset within the
module. Module names are globally unique and the asset path is unique only within its module.
Assets do not necessarily need to be associated with a module: an asset, for example, may be one explicit copy of the
asset located over the Internet, for example, a document stored on the web. In this case, the asset URI may be a URL,
for example:
http://docs.sun.com/app/docs/doc/819-1771-24.pdf
Finally, assets may belong to the "system-wide" asset repository. This mechanism exists for backward compatibility to
Wonderland v0.3 and v0.4. In these versions, the base URL of the asset server is defined by a run-time property;
each asset URI is specified as a relative path beneath the base URL of the asset server, for example:
models/mpk20.jme.gz
Asset checksums
Whether a cached asset is used or whether an asset is download fresh from an asset server depends (in part) upon
whether the checksum of the asset currently cached matches the checksum of the asset currently desired. The
checksum is a hex string-encoded representation of the SHA-1 hash of the asset's contents (although the specific
hash algorithm used by the implementation is not a key detail).
An asset, therefore, is uniquely identified by the (Asset URI, Checksum) pair. This allows different versions of the
same asset to exist within the client-side cache. This is useful in the following example: suppose a user is
teleporting between two Wonderland servers that have similar worlds, except one has a more recent version
of a module installed that includes updated artwork (with different checksums from the assets in the other
world). By uniquely identifying an asset by the (Asset URI, Checksum) pair, the assets on each server are
identified to be distinct (even though they have the same asset URI) and both may be cached on a user's
lock disk at the same time to avoid downloading each asset after every teleport.
Cache size and Least-Recently Used (LRU) replacement scheme
To prevent the asset cache from growing too large, the size of the cache has an upper limit, currently hard-coded
in the asset management source code. When the client-side asset manager attempts to add a new asset and
finds the asset cache near its maximum size, it frees up space in the asset cache by removing the "oldest"
entries until it has enough room to add the new asset.
To help implement this scheme, the size of each asset is stored in the database (SIZE column). This size
is computed when the asset is first downloaded and cached. The overhead of maintaining the directory
structure of the cache is not included in the size calculation--the maximum cache size, therefore, should
not be considered a strict limit. The total size of the cache is computed via the SQL SUM() function.
Each cache entry in the database also maintains the date and time (in milliseconds since in the epoch)
the asset was last accessed (LAST_ACCESSED column). An asset is accessed when: it is first added
to the cache, when it is updated in the cache (e.g. if an asset is forceably re-downloaded), or when the
asset is read from the cache.
Entries from the asset cache are removed only when a new entry must be added and there is not enough
room to do so. In such a case, the asset with the smallest (i.e. oldest) "last accessed" value is removed and
the total size of the cache is recomputed. If there is still not enough room in the cache for the new asset,
the asset with the next smallest "last accessed" value is removed. This process repeats until there is
enough room in the asset cache for the new asset.
Asset Cache
The cached assets are stored beneath the cache/ directory, where each unique asset can be located
knowing only the asset URI and checksum. The cache/ directory has the following sub-directory structure
for assets belonging to a module (e.g. wlm://...), definite assets identified by a URL (e.g. http://....), and
assets belonging to the system-wide asset repository (e.g. models/mpk20.jme.gz).
cache/
|--------- modules/
|--------- definite/
|--------- system/
where assets belonging to a module are stored beneath the modules/ directory, assets identified by
a definite URL are stored beneath the definite/ directory, and assets belonging to the system-wide
asset repository are stored beneath the system/ directory.
Structure of the modules/ directory
Assets that belong to modules are uniquely identified, in part, by the name of the module in which
they reside, and the relative path of the module within the module. An asset is also identified by
its checksum: slightly different versions of an asset from the same module with the same relative path
may both be cached. Therefore, the module name, relative path, and checksum are all part of
locating an asset within the cache.
The directory structure beneath the modules/ directory takes the following form:
modules/<module name>/<relative path>/<checksum>
where <module name> is the unique name of the module, <relative path> is the relative
path of the asset within the module, and <checksum> is the asset's checksum. For example,
suppose two different versions of the same asset exists with the URI: wlm://mpk20/textures/poster.jpg
(but have different checksums). The directory structure of the cache would be:
modules/
|-------- mpk20/
|-------- textures/
|-------- poster.jpg/
|-------- ASD673FLSJKWE432342
|-------- DFGERDIDFGIOCB323ZD
where ASD673FLSJKWE432342 and DFGERDIDFGIOCB323ZD are the checksums of the two
different versions and are files. Note that "poster.jpg" is a directory within the cache, not a file.
Structure of the definite/ directory
Since an asset defined by a definite URL is globally unique, the URL defines the directory hierarchy
in which the cache file exists. For example, the asset defined by the URL "http://docs.sun.com/app/docs/doc/819-1771-24.pdf"
would be stored in "cache/definite/docs.sun.com/app/docs/doc/891-1771-24.pdf".
Structure of the system/ directory
Since only a single, system-wide repository exists, assets that belong to the system-wide asset
repository are uniquely identified by the relative path name of the asset within the repository. The
directory hierarchy in which the cache file exists is defined by this relative path. For example, the
asset defined by the relative path "models/mpk20.jme.gz" would be stored in
"cache/definite/models/mpk20.jme.gz".
|