UFile

API: Internal/Beta

A UFile is a resource for storing, retrieving and organizing data in UCloud

data class UFile(
    val id: String,
    val specification: UFileSpecification,
    val createdAt: Long,
    val status: UFileStatus,
    val owner: ResourceOwner,
    val permissions: ResourcePermissions?,
    val updates: List<UFileUpdate>,
    val providerGeneratedId: String?,
)

A file in UCloud (UFile) closely follows the concept of a computer file you might already be familiar with. The functionality of a file is mostly determined by its type. The two most important types are the DIRECTORY and FILE types. A DIRECTORY is a container of UFiles. A directory can itself contain more directories, which leads to a natural tree-like structure. FILEs, also referred to as a regular files, are data records which each contain a series of bytes.

All files in UCloud have a name associated with them. This name uniquely identifies them within their directory. All files in UCloud belong to exactly one directory.

File operations must be able to reference the files on which they operate. In UCloud, these references are made through the id property, also known as a path. Paths use the tree-like structure of files to reference a file, it does so by declaring which directories to go through, starting at the top, to reach the file we are referencing. This information is serialized as a textual string, where each step of the path is separated by forward-slash / (U+002F). The path must start with a single forward-slash, which signifies the root of the file tree. UCloud never users ‘relative’ file paths, which some systems use.

All files in UCloud additionally have metadata associated with them. For this we differentiate between system-level metadata and user-defined metadata.

We have just covered two examples of system-level metadata, the id (path) and type. UCloud additionally supports metadata such as general stats about the files, such as file sizes. All files have a set of permissions associated with them, providers may optionally expose this information to UCloud and the users.

User-defined metadata describe the contents of a file. All metadata is described by a template (FileMetadataTemplate), this template defines a document structure for the metadata. User-defined metadata can be used for a variety of purposes, such as: Datacite metadata, sensitivity levels, and other field specific metadata formats.

Properties
id: String A unique reference to a file

All files in UCloud have a name associated with them. This name uniquely identifies them within their directory. All files in UCloud belong to exactly one directory. A name can be any textual string, for example: thesis-42.docx. However, certain restrictions apply to file names, see below for a concrete list of rules and recommendations.

The extension of a file is typically used as a hint to clients how to treat a specific file. For example, an extension might indicate that the file contains a video of a specific format. In UCloud, the file’s extension is derived from its name. In UCloud, it is simply defined as the text immediately following, and not including, the last period . (U+002E). The table below shows some examples of how UCloud determines the extension of a file:

File name Derived extension Comment
thesis-42.docx docx -
thesis-43-final.tar tar -
thesis-43-FINAL2.tar.gz gz Note that UCloud does not recognize tar as being part of the extension
thesis Empty string
.ssh ssh 'Hidden' files also have a surprising extension in UCloud

File operations must be able to reference the files on which they operate. In UCloud, these references are made through the path property. Paths use the tree-like structure of files to reference a file, it does so by declaring which directories to go through, starting at the top, to reach the file we are referencing. This information is serialized as a textual string, where each step of the path is separated by forward-slash / (U+002F). The path must start with a single forward-slash, which signifies the root of the file tree. UCloud never users ‘relative’ file paths, which some systems use.

A path in UCloud is structured in such a way that they are unique across all providers and file systems. The figure below shows how a UCloud path is structured, and how it can be mapped to an internal file-system path.

Figure: At the top, a UCloud path along with the components of it. At the bottom, an example of an internal, provider specific, file-system path.

The figure shows how a UCloud path consists of four components:

  1. The ‘Provider ID’ references the provider who owns and hosts the file

  2. The product reference, this references the product that is hosting the FileCollection

  3. The FileCollection ID references the ID of the internal file collection. These are controlled by the provider and match the different types of file-systems they have available. A single file collection typically maps to a specific folder on the provider’s file-system.

  4. The internal path, which tells the provider how to find the file within the collection. Providers can typically pass this as a one-to-one mapping.

Rules of a file name:

  1. The name cannot be equal to . (commonly interpreted to mean the current directory)

  2. The name cannot be equal to .. (commonly interpreted to mean the parent directory)

  3. The name cannot contain a forward-slash / (U+002F)

  4. Names are strictly unicode

UCloud will normalize a path which contain . or .. in a path’s step. It is normalized according to the comments mentioned in rule 1 and 2.

Note that all paths in unicode are strictly unicode (rule 4). This is different from the unix standard. Unix file names can contain arbitrary binary data. (TODO determine how providers should handle this edge-case)

Additionally regarding file names, UCloud recommends to users the following:

  • Avoid the following file names:

    • Containing Windows reserved characters: <, >, :, ", /, |, ?, *, \

    • Any of the reserved file names in Windows:

      • AUX

      • COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9

      • CON

      • LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

      • NUL

      • PRN

      • Any of the above followed by an extension

    • Avoid ASCII control characters (decimal value 0-31 both inclusive)

    • Avoid Unicode control characters (e.g. right-to-left override)

    • Avoid line breaks, paragraph separators and other unicode separators which is typically interpreted as a line-break

    • Avoid binary names

UCloud will attempt to reject these for file operations initiated through the client, but it cannot ensure that these files do not appear regardless. This is due to the fact that the file systems are typically mounted directly by user-controlled jobs.

Rules of a file path:

  1. All paths must be absolute, that is they must start with /

  2. UCloud will normalize all path ‘steps’ containing either . or ..

Additionally UCloud recommends to users the following regarding paths:

  • Avoid long paths:

    • Older versions of Unixes report PATH_MAX as 1024

    • Newer versions of Unixes report PATH_MAX as 4096

    • Older versions of Windows start failing above 256 characters

specification: UFileSpecification
createdAt: Long Timestamp referencing when the request for creation was received by UCloud
status: UFileStatus Holds the current status of the `Resource`
owner: ResourceOwner Contains information about the original creator of the `Resource` along with project association
permissions: ResourcePermissions? Permissions assigned to this resource

A null value indicates that permissions are not supported by this resource type.

updates: List<UFileUpdate>
providerGeneratedId: String?