Archive Container - Best practice

E-mails in the REDDOXX MailDepot are stored in so-called Archive-Containers. The Archive-Containers
are designed to store and index a very large number of individual E-Mails.
One Archive-Container can therefore easily store several million E-Mails or several terabytes of
of data if the corresponding requirements are met.

An Archive-Container consists of two components, the data (Volume-Files) and the full-text index for the
E-mails. The data of these two components are stored separately from each other, but within a common Base-Directory.

To ensure that the Archive-Containers can be operated consistently and without problems, it is important to select the correct
storages. When selecting the storages, it is crucial whether the Archive-Container is still being written to or whether it is read-only in the Appliance.

For Archive-containers that are still being written to (in particular the "Default" Container), a permanently
available storage must be provided. Network-Shares should definitely be avoided here.

Archive-Containers that are read-only, can also be integrated from Network-Shares without any problems.

A more detailed look at the individual Storage-Types are described later in this document.

As already mentioned, an Archive-Container consists of 2 components, which are stored under a common
Base-Directory.

A closed Archive-Container can therefore be copied or moved very easily between storages.
All you need to do is copy or move the Base-Directory including all the Directories under Directories.

An Archive-Container that is still open in the Appliance or in other Utilities,
cannot be consistently copied or moved. Only closed Archive-Containers may be copied
or moved.

The Directory-Structure of an Archive-Container is structured as follows:

Grafik

Info
The maximum size of the individual volume files can be defined when creating a new Archive-Container.
This is due to historical reasons and is obsolete.

**The default is 4096 MB and there is no longer any need to adjust this value.

Archive-Containers can be opened and searched by any Storage Types including Network-Shares.

Archive-Containers that are still being written to should always be stored on a Block-Device.

Both the Fulltext-Index and the Data-Volumes are similar to databases and constantly have files open to write data-
If there is a network interruption when writing data to an Archive-Container or the NAS is not accessible for a longer period of time (e.g. due to a reboot) , it is very easy for inconsistent data to occur, which may not be automatically corrected.

Network-Storages such as NFS or CIFS/SMB shares are therefore not suitable for securely storing Archive-Containers that are still being written to.

Archive-Containers that are only used for reading, can also be stored on Network-Shares without any problems.

The REDDOXX Appliance supports the connection of different storage types.
Therefore, we try to give an overview of the Storage-Types and the recommended purpose here.

Local Storage is a data carrier directly connected to the REDDOXX Appliance. Since the
REDDOXX Appliance is now almost exclusively operated as a Virtual-Appliance, the Local Storage corresponds to a Virtual-Disk of the virtualization environment used.

Local Storage is the optimal storage for saving Archive-Containers, In particular for the "Default" Archive-Container, or other Archive-Containers which, for example, are in use by
Archive-Tasks.

-> NTFS-Formatted Local Storages

NTFS formatted disks can be integrated into the appliance, but the performance, in
connection with Archive-Containers, is significantly worse (up to 10 times slower)!
 
In addition, a significantly higher CPU-Load is generated when archive containers are mounted from an NTFS-formatted Disk.

-> Appliance failover configuration

Local Storage must not be used if two REDDOXX-Appliances are configured as a failover system.
However, a failover configuration with virtual appliances is neither common nor recommended.

iSCSI is, just like Local Storage, a block-based storage and is therefore also well suited for Archive-Containers that are to be written to.

The REDDOXX appliance supports a standard connection to iSCSI-Targets.
In environments where iSCSI is used to an extended extent (e.g. with multipath connection), this should be implemented by the virtualization solution.

If the virtualization solution is already connected to iSCSI, then it is usually better to have local disks provided for the appliance by the virtualizer.

-> NTFS-Formatted iSCSI storages

NTFS formatted disks can be integrated into the appliance, but the performance, in connection with Archive-Containers, is significantly worse (up to 10 times slower)!
In addition, a significantly higher CPU-Load is generated when Archive-Containers are mounted from an NTFS formatted Disk.

NFS-Shares are well suited for integrating Archive-Containers that are read-only.

When writing to Archive-Containers that are located on an NFS-Share, network interruptions can lead to a defect in the Archive-Container.
You should therefore avoid writing data to Archive-Containers if they are integrated into the REDDOXX Appliance from an NFS-Share.

The same applies for CIFS/SMB shares, as for NFS-Shares. However, practice shows that the performance
of the Archive-Containers is sometimes significantly worse with CIFS/SMB than with NFS shares, depending on the Storage-Device used.
We therefore recommend using CIFS/SMB shares exclusively for backing up the appliance.

As already described, the Archive-Containers were developed for very large volumes of data.
The theoretical limit of a single Archive-Container is approx. 400 TB or 4 billion E-Mails.

However, in practice it does not make sense to create such large Archive-Containers for various reasons.
Too many small Archive-Containers, on the other hand, have other disadvantages and are just as pointless.

In general, it is better for performance when searching through the entire Archive if there are fewer and larger Archive-Containers. If a search is processed in the Archive, then this search must be processed in
each Fulltext-Index of the individual Archive-Containers, and then the individual results must be aggregated.
This means that the more Archive-Containers there are, the greater the performance loss due to the aggregation of the individual search results.

There is no general answer to this question. From a purely technical point of view, the bigger the better.
However, in practice there are a few points that should be taken into account. The two most important are described below.

If, for example, there was only one very large Archive-Container, then during a migration (e.g.
storage), the entire E-Mail-Archive would not be available for the duration of the migration!

For this reason, the maximum size of an Achive-Container should be selected so that the container remains available within the
storage technologies used to be "moved" in an acceptable time. By using several Archive-Containers, these can be migrated successively, for example, without the E-Mail archive being completely.

If the Fulltext-Index of an Archive-Container is defective, the Fulltext-Index can be restored from the data. However, this process takes time, and during this Re-indexing the Archive-Container is of course not available to users.

The use of Annual-Containers has proven itself in many installations.

This means that one Archive-Container is created per year. When a new Container is created, the previous one can be migrated to Network-Storage, for example, if required.

However, if only a few E-Mails (e.g. less than 1 million per year) are archived in the environment,
then it may only make sense to create a new Archive-Container every 2 or 3 years.
To avoid many very small Archive-Containers.

If a large number of E-Mails are archived in the environment (e.g. more than 10 million per year), then it may make sense to create 2 Archive-Containers or more per year.