Ingoing API¶
The ingoing provider API for Jobs
Rationale¶
Job
s in UCloud are the core abstraction used to describe units of computation.
This document describes the API which providers receive to implement Jobs. We recommend that you read the documentation for the end-user API first. Most of this API is a natural extension of the end-user APIs. Almost all RPCs in this API have a direct match in the end-user API. Most endpoints in the provider API receives a Job along with extra call details. This is the main difference from the end-user API. In the end-user API the request is mainly a reference or a specification plus call details.
It is not required that you, as a provider, implement all calls. However, you must implement all the calls
which you support. This level of support is controlled by your response to the retrieveProducts
call (See below for an example).
📝 Provider Note: This is the API exposed to providers. See the table below for other relevant APIs.
End-User | Provider (Ingoing) | Control (Outgoing) |
---|---|---|
Jobs |
JobsProvider |
JobsControl |
Multi-replica Jobs (Container backend)¶
A Job
can be scheduled on more than one replica. The orchestrator requires that backends execute the exact same
command on all the nodes. Information about other nodes will be mounted at /etc/ucloud
. This information allows jobs
to configure themselves accordingly.
Each node is given a rank. The rank is 0-indexed. By convention index 0 is used as a primary point of contact.
The table below summarizes the files mounted at /etc/ucloud
and their contents:
Name | Description |
---|---|
node-$rank.txt |
Single line containing hostname/ip address of the 'node'. |
rank.txt |
Single line containing the rank of this node. |
cores.txt |
Single line containing the amount of cores allocated. |
number_of_nodes.txt |
Single line containing the number of nodes allocated. |
job_id.txt |
Single line containing the id of this job. |
📝 NOTE: We expect that the mount location will become more flexible in a future release. See issue #2124.
Networking and Peering with Other Applications¶
Job
s are, by default, only allowed to perform networking with other nodes in the same Job
. A user can override this
by requesting, at Job
startup, networking with an existing job. This will configure the firewall accordingly and allow
networking between the two Job
s. This will also automatically provide user-friendly hostnames for the Job
.
The /work
ing directory (Container backend)¶
UCloud assumes that the /work
directory is available for data which needs to be persisted. It is expected
that files left directly in this directory is placed in the output
folder of the Job
.
Ephemeral Resources¶
Every Job
has some resources which exist only as long as the Job
is RUNNING
. These types of resources are said to
be ephemeral resources. Examples of this includes temporary working storage included as part of the Job
. Such
storage is not guaranteed to be persisted across Job
runs and Application
s should not rely on this behavior.
Job Scheduler¶
The job scheduler is responsible for running Job
s on behalf of users. The provider can tweak which features the
scheduler is able to support using the provider manifest.
UCloud puts no strict requirements on how the job scheduler runs job and leaves this to the provider. For example, this means that there are no strict requirements on how jobs are queued. Jobs can be run in any order which the provider sees fit.
Table of Contents¶
1. Examples
2. Remote Procedure Calls
Name | Description |
---|---|
follow |
Follow the progress of a job |
retrieveProducts |
Retrieve product support for this provider |
retrieveUtilization |
Retrieve information about how busy the provider's cluster currently is |
create |
Creates one or more resources |
extend |
Extend the duration of one or more jobs |
init |
Request from the user to (potentially) initialize any resources |
openInteractiveSession |
Opens an interactive session (e.g. terminal, web or VNC) |
suspend |
Suspend a job |
terminate |
Request job cancellation and destruction |
unsuspend |
Unsuspends a job |
updateAcl |
Callback received by the Provider when permissions are updated |
verify |
Invoked by UCloud/Core to trigger verification of a single batch |
3. Data Models
Name | Description |
---|---|
JobsProviderExtendRequestItem |
A request to extend the timeAllocation of a Job |
JobsProviderFollowRequest |
A request to start/stop a follow session |
JobsProviderFollowRequest.CancelStream |
Stop an existing follow session for a given Job |
JobsProviderFollowRequest.Init |
Start a new follow session for a given Job |
JobsProviderOpenInteractiveSessionRequestItem |
A request for opening a new interactive session (e.g. terminal) |
JobsProviderSuspendRequestItem |
No description |
JobsProviderUtilizationRequest |
No description |
JobsProviderFollowResponse |
A message emitted by the Provider in a follow session |
Example: Declaring support full support for containerized applications¶
Frequency of use | Common |
---|---|
Actors |
|
Communication Flow: Kotlin
/* In this example we will show how you, as a provider, can declare full support for containerized
applications. This example assumes that you have already registered two compute products with
UCloud/Core. */
/* The retrieveProducts call will be invoked by the UCloud/Core service account. UCloud will generally
cache this response for a period of time before re-querying for information. As a result, changes
in your response might not be immediately visible in UCloud. */
JobsProvider.retrieveProducts.call(
Unit,
ucloud
).orThrow()
/*
BulkResponse(
responses = listOf(ComputeSupport(
docker = ComputeSupport.Docker(
enabled = true,
logs = true,
peers = true,
terminal = true,
timeExtension = true,
utilization = true,
vnc = true,
web = true,
),
maintenance = null,
native = ComputeSupport.Native(
enabled = null,
logs = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
product = ProductReference(
category = "example-compute",
id = "example-compute-1",
provider = "example",
),
virtualMachine = ComputeSupport.VirtualMachine(
enabled = null,
logs = null,
suspension = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
),
), ComputeSupport(
docker = ComputeSupport.Docker(
enabled = true,
logs = true,
peers = true,
terminal = true,
timeExtension = true,
utilization = true,
vnc = true,
web = true,
),
maintenance = null,
native = ComputeSupport.Native(
enabled = null,
logs = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
product = ProductReference(
category = "example-compute",
id = "example-compute-2",
provider = "example",
),
virtualMachine = ComputeSupport.VirtualMachine(
enabled = null,
logs = null,
suspension = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
),
)),
)
*/
/* 📝 Note: The support information must be repeated for every Product you support. */
/* 📝 Note: The Products mentioned in this response must already be registered with UCloud. */
Communication Flow: Curl
# ------------------------------------------------------------------------------------------------------
# $host is the UCloud instance to contact. Example: 'http://localhost:8080' or 'https://cloud.sdu.dk'
# $accessToken is a valid access-token issued by UCloud
# ------------------------------------------------------------------------------------------------------
# In this example we will show how you, as a provider, can declare full support for containerized
# applications. This example assumes that you have already registered two compute products with
# UCloud/Core.
# The retrieveProducts call will be invoked by the UCloud/Core service account. UCloud will generally
# cache this response for a period of time before re-querying for information. As a result, changes
# in your response might not be immediately visible in UCloud.
# Authenticated as ucloud
curl -XGET -H "Authorization: Bearer $accessToken" "$host/ucloud/PROVIDERID/jobs/retrieveProducts"
# {
# "responses": [
# {
# "product": {
# "id": "example-compute-1",
# "category": "example-compute",
# "provider": "example"
# },
# "docker": {
# "enabled": true,
# "web": true,
# "vnc": true,
# "logs": true,
# "terminal": true,
# "peers": true,
# "timeExtension": true,
# "utilization": true
# },
# "virtualMachine": {
# "enabled": null,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "suspension": null,
# "utilization": null
# },
# "native": {
# "enabled": null,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "utilization": null,
# "web": null
# },
# "maintenance": null
# },
# {
# "product": {
# "id": "example-compute-2",
# "category": "example-compute",
# "provider": "example"
# },
# "docker": {
# "enabled": true,
# "web": true,
# "vnc": true,
# "logs": true,
# "terminal": true,
# "peers": true,
# "timeExtension": true,
# "utilization": true
# },
# "virtualMachine": {
# "enabled": null,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "suspension": null,
# "utilization": null
# },
# "native": {
# "enabled": null,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "utilization": null,
# "web": null
# },
# "maintenance": null
# }
# ]
# }
# 📝 Note: The support information must be repeated for every Product you support.
# 📝 Note: The Products mentioned in this response must already be registered with UCloud.
Communication Flow: Visual
Example: Declaring minimal support for virtual machines¶
Frequency of use | Common |
---|---|
Actors |
|
Communication Flow: Kotlin
/* In this example we will show how you, as a provider, can declare minimal support for virtual
machines. This example assumes that you have already registered two compute products with
UCloud/Core. */
/* The retrieveProducts call will be invoked by the UCloud/Core service account. UCloud will generally
cache this response for a period of time before re-querying for information. As a result, changes
in your response might not be immediately visible in UCloud. */
JobsProvider.retrieveProducts.call(
Unit,
ucloud
).orThrow()
/*
BulkResponse(
responses = listOf(ComputeSupport(
docker = ComputeSupport.Docker(
enabled = null,
logs = null,
peers = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
maintenance = null,
native = ComputeSupport.Native(
enabled = null,
logs = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
product = ProductReference(
category = "example-compute",
id = "example-compute-1",
provider = "example",
),
virtualMachine = ComputeSupport.VirtualMachine(
enabled = true,
logs = null,
suspension = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
),
), ComputeSupport(
docker = ComputeSupport.Docker(
enabled = null,
logs = null,
peers = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
maintenance = null,
native = ComputeSupport.Native(
enabled = null,
logs = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
product = ProductReference(
category = "example-compute",
id = "example-compute-2",
provider = "example",
),
virtualMachine = ComputeSupport.VirtualMachine(
enabled = true,
logs = null,
suspension = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
),
)),
)
*/
/* 📝 Note: If a support feature is not explicitly mentioned, then no support is assumed. */
/* 📝 Note: The support information must be repeated for every Product you support. */
/* 📝 Note: The Products mentioned in this response must already be registered with UCloud. */
Communication Flow: Curl
# ------------------------------------------------------------------------------------------------------
# $host is the UCloud instance to contact. Example: 'http://localhost:8080' or 'https://cloud.sdu.dk'
# $accessToken is a valid access-token issued by UCloud
# ------------------------------------------------------------------------------------------------------
# In this example we will show how you, as a provider, can declare minimal support for virtual
# machines. This example assumes that you have already registered two compute products with
# UCloud/Core.
# The retrieveProducts call will be invoked by the UCloud/Core service account. UCloud will generally
# cache this response for a period of time before re-querying for information. As a result, changes
# in your response might not be immediately visible in UCloud.
# Authenticated as ucloud
curl -XGET -H "Authorization: Bearer $accessToken" "$host/ucloud/PROVIDERID/jobs/retrieveProducts"
# {
# "responses": [
# {
# "product": {
# "id": "example-compute-1",
# "category": "example-compute",
# "provider": "example"
# },
# "docker": {
# "enabled": null,
# "web": null,
# "vnc": null,
# "logs": null,
# "terminal": null,
# "peers": null,
# "timeExtension": null,
# "utilization": null
# },
# "virtualMachine": {
# "enabled": true,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "suspension": null,
# "utilization": null
# },
# "native": {
# "enabled": null,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "utilization": null,
# "web": null
# },
# "maintenance": null
# },
# {
# "product": {
# "id": "example-compute-2",
# "category": "example-compute",
# "provider": "example"
# },
# "docker": {
# "enabled": null,
# "web": null,
# "vnc": null,
# "logs": null,
# "terminal": null,
# "peers": null,
# "timeExtension": null,
# "utilization": null
# },
# "virtualMachine": {
# "enabled": true,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "suspension": null,
# "utilization": null
# },
# "native": {
# "enabled": null,
# "logs": null,
# "vnc": null,
# "terminal": null,
# "timeExtension": null,
# "utilization": null,
# "web": null
# },
# "maintenance": null
# }
# ]
# }
# 📝 Note: If a support feature is not explicitly mentioned, then no support is assumed.
# 📝 Note: The support information must be repeated for every Product you support.
# 📝 Note: The Products mentioned in this response must already be registered with UCloud.
Communication Flow: Visual
Example: Simple batch Job with life-cycle events¶
Frequency of use | Common |
---|---|
Pre-conditions |
|
Actors |
|
Communication Flow: Kotlin
/* In this example we will show the creation of a simple batch Job. The procedure starts with the
Provider receives a create request from UCloud/Core */
/* The request below contains a lot of information. We recommend that you read about and understand
Products, Applications and Jobs before you continue. We will attempt to summarize the information
below:
- The request contains one or more Jobs. The Provider should schedule each of them on their
infrastructure.
- The `id` of a Job is unique, globally, in UCloud.
- The `owner` references the UCloud identity and workspace of the creator
- The `specification` contains the user's request
- The `status` contains UCloud's view of the Job _AND_ resolved resources required for the Job
In this example:
- Exactly one Job will be created.
- `items` contains only one Job
- This Job will run a `BATCH` application
- See `status.resolvedApplication.invocation.applicationType`
- It will run on the `example-compute-1` machine-type
- See `specification.product` and `status.resolvedProduct`
- The application should launch the `acme/batch:1.0.0` container
- `status.resolvedApplication.invocation.tool.tool.description.backend`
- `status.resolvedApplication.invocation.tool.tool.description.image`
- It will be invoked with `acme-batch --debug "Hello, World!"`.
- The invocation is created from `status.resolvedApplication.invocation.invocation`
- With parameters defined in `status.resolvedApplication.invocation.parameters`
- And values defined in `specification.parameters`
- The Job should be scheduled with a max wall-time of 1 hour
- See `specification.timeAllocation`
- ...on exactly 1 node.
- See `specification.replicas` */
JobsProvider.create.call(
bulkRequestOf(Job(
createdAt = 1633329776235,
id = "54112",
output = null,
owner = ResourceOwner(
createdBy = "user",
project = null,
),
permissions = null,
specification = JobSpecification(
allowDuplicateJob = false,
application = NameAndVersion(
name = "acme-batch",
version = "1.0.0",
),
name = null,
openedFile = null,
parameters = mapOf("debug" to AppParameterValue.Bool(
value = true,
), "value" to AppParameterValue.Text(
value = "Hello, World!",
)),
product = ProductReference(
category = "example-compute",
id = "example-compute-1",
provider = "example",
),
replicas = 1,
resources = null,
restartOnExit = null,
sshEnabled = null,
timeAllocation = SimpleDuration(
hours = 1,
minutes = 0,
seconds = 0,
),
),
status = JobStatus(
allowRestart = false,
expiresAt = null,
jobParametersJson = null,
resolvedApplication = Application(
invocation = ApplicationInvocationDescription(
allowAdditionalMounts = null,
allowAdditionalPeers = null,
allowMultiNode = false,
allowPublicIp = false,
allowPublicLink = null,
applicationType = ApplicationType.BATCH,
container = null,
environment = null,
fileExtensions = emptyList(),
invocation = listOf(WordInvocationParameter(
word = "acme-batch",
), VariableInvocationParameter(
isPrefixVariablePartOfArg = false,
isSuffixVariablePartOfArg = false,
prefixGlobal = "--debug ",
prefixVariable = "",
suffixGlobal = "",
suffixVariable = "",
variableNames = listOf("debug"),
), VariableInvocationParameter(
isPrefixVariablePartOfArg = false,
isSuffixVariablePartOfArg = false,
prefixGlobal = "",
prefixVariable = "",
suffixGlobal = "",
suffixVariable = "",
variableNames = listOf("value"),
)),
licenseServers = emptyList(),
modules = null,
outputFileGlobs = listOf("*"),
parameters = listOf(ApplicationParameter.Bool(
defaultValue = null,
description = "Should debug be enabled?",
falseValue = "false",
name = "debug",
optional = false,
title = "",
trueValue = "true",
), ApplicationParameter.Text(
defaultValue = null,
description = "The value for the batch application",
name = "value",
optional = false,
title = "",
)),
shouldAllowAdditionalMounts = false,
shouldAllowAdditionalPeers = true,
ssh = null,
tool = ToolReference(
name = "acme-batch",
tool = Tool(
createdAt = 1633329776235,
description = NormalizedToolDescription(
authors = listOf("UCloud"),
backend = ToolBackend.DOCKER,
container = null,
defaultNumberOfNodes = 1,
defaultTimeAllocation = SimpleDuration(
hours = 1,
minutes = 0,
seconds = 0,
),
description = "An example tool",
image = "acme/batch:1.0.0",
info = NameAndVersion(
name = "acme-batch",
version = "1.0.0",
),
license = "None",
requiredModules = emptyList(),
supportedProviders = null,
title = "Acme batch",
),
modifiedAt = 1633329776235,
owner = "_ucloud",
),
version = "1.0.0",
),
vnc = null,
web = null,
),
metadata = ApplicationMetadata(
authors = listOf("UCloud"),
createdAt = 1717663228341,
description = "An example application",
flavorName = null,
group = ApplicationGroup(
metadata = ApplicationGroup.Metadata(
id = 0,
),
specification = ApplicationGroup.Specification(
categories = emptySet(),
colorReplacement = ApplicationGroup.ColorReplacements(
dark = null,
light = null,
),
defaultFlavor = null,
description = "",
logoHasText = false,
title = "Test Group",
),
status = ApplicationGroup.Status(
applications = null,
),
),
isPublic = true,
name = "acme-batch",
public = true,
title = "Acme batch",
version = "1.0.0",
website = null,
),
),
resolvedProduct = Product.Compute(
allowAllocationRequestsFrom = AllocationRequestsGroup.ALL,
category = ProductCategoryId(
id = "example-compute",
name = "example-compute",
provider = "example",
),
chargeType = ChargeType.ABSOLUTE,
cpu = 1,
cpuModel = null,
description = "An example machine",
freeToUse = false,
gpu = 0,
gpuModel = null,
hiddenInGrantApplications = false,
memoryInGigs = 2,
memoryModel = null,
name = "example-compute-1",
pricePerUnit = 1000000,
priority = 0,
productType = ProductType.COMPUTE,
unitOfPrice = ProductPriceUnit.CREDITS_PER_MINUTE,
version = 1,
balance = null,
id = "example-compute-1",
maxUsableBalance = null,
),
resolvedSupport = ResolvedSupport(
product = Product.Compute(
allowAllocationRequestsFrom = AllocationRequestsGroup.ALL,
category = ProductCategoryId(
id = "example-compute",
name = "example-compute",
provider = "example",
),
chargeType = ChargeType.ABSOLUTE,
cpu = 1,
cpuModel = null,
description = "An example machine",
freeToUse = false,
gpu = 0,
gpuModel = null,
hiddenInGrantApplications = false,
memoryInGigs = 2,
memoryModel = null,
name = "example-compute-1",
pricePerUnit = 1000000,
priority = 0,
productType = ProductType.COMPUTE,
unitOfPrice = ProductPriceUnit.CREDITS_PER_MINUTE,
version = 1,
balance = null,
id = "example-compute-1",
maxUsableBalance = null,
),
support = ComputeSupport(
docker = ComputeSupport.Docker(
enabled = true,
logs = null,
peers = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
maintenance = null,
native = ComputeSupport.Native(
enabled = null,
logs = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
web = null,
),
product = ProductReference(
category = "example-compute",
id = "example-compute-1",
provider = "example",
),
virtualMachine = ComputeSupport.VirtualMachine(
enabled = null,
logs = null,
suspension = null,
terminal = null,
timeExtension = null,
utilization = null,
vnc = null,
),
),
),
startedAt = null,
state = JobState.IN_QUEUE,
),
updates = emptyList(),
providerGeneratedId = "54112",
)),
ucloud
).orThrow()
/*
BulkResponse(
responses = listOf(null),
)
*/
/* 📝 Note: The response in this case indicates that the Provider chose not to generate an internal ID
for this Job. If an ID was provided, then on subsequent requests the `providerGeneratedId` of this
Job would be set accordingly. This feature can help providers keep track of their internal state
without having to actively maintain a mapping. */
/* The Provider will use this information to schedule the Job on their infrastructure. Through
background processing, the Provider will keep track of this Job. The Provider notifies UCloud of
state changes as they occur. This happens through the outgoing Control API. */
JobsControl.update.call(
bulkRequestOf(ResourceUpdateAndId(
id = "54112",
update = JobUpdate(
allowRestart = null,
expectedDifferentState = null,
expectedState = null,
newMounts = null,
newTimeAllocation = null,
outputFolder = null,
state = JobState.RUNNING,
status = "The job is now running!",
timestamp = 0,
),
)),
provider
).orThrow()
/*
Unit
*/
/* 📝 Note: The timestamp field will be filled out by UCloud/Core */
/* ~ Some time later ~ */
JobsControl.update.call(
bulkRequestOf(ResourceUpdateAndId(
id = "54112",
update = JobUpdate(
allowRestart = null,
expectedDifferentState = null,
expectedState = null,
newMounts = null,
newTimeAllocation = null,
outputFolder = null,
state = JobState.SUCCESS,
status = "The job has finished processing!",
timestamp = 0,
),
)),
provider
).orThrow()
/*
Unit
*/
Communication Flow: Curl
# ------------------------------------------------------------------------------------------------------
# $host is the UCloud instance to contact. Example: 'http://localhost:8080' or 'https://cloud.sdu.dk'
# $accessToken is a valid access-token issued by UCloud
# ------------------------------------------------------------------------------------------------------
# In this example we will show the creation of a simple batch Job. The procedure starts with the
# Provider receives a create request from UCloud/Core
# The request below contains a lot of information. We recommend that you read about and understand
# Products, Applications and Jobs before you continue. We will attempt to summarize the information
# below:
#
# - The request contains one or more Jobs. The Provider should schedule each of them on their
# infrastructure.
# - The `id` of a Job is unique, globally, in UCloud.
# - The `owner` references the UCloud identity and workspace of the creator
# - The `specification` contains the user's request
# - The `status` contains UCloud's view of the Job _AND_ resolved resources required for the Job
#
# In this example:
#
# - Exactly one Job will be created.
# - `items` contains only one Job
#
# - This Job will run a `BATCH` application
# - See `status.resolvedApplication.invocation.applicationType`
#
# - It will run on the `example-compute-1` machine-type
# - See `specification.product` and `status.resolvedProduct`
#
# - The application should launch the `acme/batch:1.0.0` container
# - `status.resolvedApplication.invocation.tool.tool.description.backend`
# - `status.resolvedApplication.invocation.tool.tool.description.image`
#
# - It will be invoked with `acme-batch --debug "Hello, World!"`.
# - The invocation is created from `status.resolvedApplication.invocation.invocation`
# - With parameters defined in `status.resolvedApplication.invocation.parameters`
# - And values defined in `specification.parameters`
#
# - The Job should be scheduled with a max wall-time of 1 hour
# - See `specification.timeAllocation`
#
# - ...on exactly 1 node.
# - See `specification.replicas`
# Authenticated as ucloud
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/ucloud/PROVIDERID/jobs" -d '{
"items": [
{
"id": "54112",
"owner": {
"createdBy": "user",
"project": null
},
"updates": [
],
"specification": {
"application": {
"name": "acme-batch",
"version": "1.0.0"
},
"product": {
"id": "example-compute-1",
"category": "example-compute",
"provider": "example"
},
"name": null,
"replicas": 1,
"allowDuplicateJob": false,
"parameters": {
"debug": {
"type": "boolean",
"value": true
},
"value": {
"type": "text",
"value": "Hello, World!"
}
},
"resources": null,
"timeAllocation": {
"hours": 1,
"minutes": 0,
"seconds": 0
},
"openedFile": null,
"restartOnExit": null,
"sshEnabled": null
},
"status": {
"state": "IN_QUEUE",
"jobParametersJson": null,
"startedAt": null,
"expiresAt": null,
"resolvedApplication": {
"metadata": {
"name": "acme-batch",
"version": "1.0.0",
"authors": [
"UCloud"
],
"title": "Acme batch",
"description": "An example application",
"website": null,
"public": true,
"flavorName": null,
"group": {
"metadata": {
"id": 0
},
"specification": {
"title": "Test Group",
"description": "",
"defaultFlavor": null,
"categories": [
],
"colorReplacement": {
"light": null,
"dark": null
},
"logoHasText": false
},
"status": {
"applications": null
}
},
"createdAt": 1717663228341
},
"invocation": {
"tool": {
"name": "acme-batch",
"version": "1.0.0",
"tool": {
"owner": "_ucloud",
"createdAt": 1633329776235,
"modifiedAt": 1633329776235,
"description": {
"info": {
"name": "acme-batch",
"version": "1.0.0"
},
"container": null,
"defaultNumberOfNodes": 1,
"defaultTimeAllocation": {
"hours": 1,
"minutes": 0,
"seconds": 0
},
"requiredModules": [
],
"authors": [
"UCloud"
],
"title": "Acme batch",
"description": "An example tool",
"backend": "DOCKER",
"license": "None",
"image": "acme/batch:1.0.0",
"supportedProviders": null
}
}
},
"invocation": [
{
"type": "word",
"word": "acme-batch"
},
{
"type": "var",
"variableNames": [
"debug"
],
"prefixGlobal": "--debug ",
"suffixGlobal": "",
"prefixVariable": "",
"suffixVariable": "",
"isPrefixVariablePartOfArg": false,
"isSuffixVariablePartOfArg": false
},
{
"type": "var",
"variableNames": [
"value"
],
"prefixGlobal": "",
"suffixGlobal": "",
"prefixVariable": "",
"suffixVariable": "",
"isPrefixVariablePartOfArg": false,
"isSuffixVariablePartOfArg": false
}
],
"parameters": [
{
"type": "boolean",
"name": "debug",
"optional": false,
"defaultValue": null,
"title": "",
"description": "Should debug be enabled?",
"trueValue": "true",
"falseValue": "false"
},
{
"type": "text",
"name": "value",
"optional": false,
"defaultValue": null,
"title": "",
"description": "The value for the batch application"
}
],
"outputFileGlobs": [
"*"
],
"applicationType": "BATCH",
"vnc": null,
"web": null,
"ssh": null,
"container": null,
"environment": null,
"allowAdditionalMounts": null,
"allowAdditionalPeers": null,
"allowMultiNode": false,
"allowPublicIp": false,
"allowPublicLink": null,
"fileExtensions": [
],
"licenseServers": [
],
"modules": null
}
},
"resolvedSupport": {
"product": {
"balance": null,
"maxUsableBalance": null,
"name": "example-compute-1",
"pricePerUnit": 1000000,
"category": {
"name": "example-compute",
"provider": "example"
},
"description": "An example machine",
"priority": 0,
"cpu": 1,
"memoryInGigs": 2,
"gpu": 0,
"cpuModel": null,
"memoryModel": null,
"gpuModel": null,
"version": 1,
"freeToUse": false,
"allowAllocationRequestsFrom": "ALL",
"unitOfPrice": "CREDITS_PER_MINUTE",
"chargeType": "ABSOLUTE",
"hiddenInGrantApplications": false,
"productType": "COMPUTE"
},
"support": {
"product": {
"id": "example-compute-1",
"category": "example-compute",
"provider": "example"
},
"docker": {
"enabled": true,
"web": null,
"vnc": null,
"logs": null,
"terminal": null,
"peers": null,
"timeExtension": null,
"utilization": null
},
"virtualMachine": {
"enabled": null,
"logs": null,
"vnc": null,
"terminal": null,
"timeExtension": null,
"suspension": null,
"utilization": null
},
"native": {
"enabled": null,
"logs": null,
"vnc": null,
"terminal": null,
"timeExtension": null,
"utilization": null,
"web": null
},
"maintenance": null
}
},
"resolvedProduct": {
"balance": null,
"maxUsableBalance": null,
"name": "example-compute-1",
"pricePerUnit": 1000000,
"category": {
"name": "example-compute",
"provider": "example"
},
"description": "An example machine",
"priority": 0,
"cpu": 1,
"memoryInGigs": 2,
"gpu": 0,
"cpuModel": null,
"memoryModel": null,
"gpuModel": null,
"version": 1,
"freeToUse": false,
"allowAllocationRequestsFrom": "ALL",
"unitOfPrice": "CREDITS_PER_MINUTE",
"chargeType": "ABSOLUTE",
"hiddenInGrantApplications": false,
"productType": "COMPUTE"
},
"allowRestart": false
},
"createdAt": 1633329776235,
"output": null,
"permissions": null
}
]
}'
# {
# "responses": [
# null
# ]
# }
# 📝 Note: The response in this case indicates that the Provider chose not to generate an internal ID
# for this Job. If an ID was provided, then on subsequent requests the `providerGeneratedId` of this
# Job would be set accordingly. This feature can help providers keep track of their internal state
# without having to actively maintain a mapping.
# The Provider will use this information to schedule the Job on their infrastructure. Through
# background processing, the Provider will keep track of this Job. The Provider notifies UCloud of
# state changes as they occur. This happens through the outgoing Control API.
# Authenticated as provider
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/api/jobs/control/update" -d '{
"items": [
{
"id": "54112",
"update": {
"state": "RUNNING",
"outputFolder": null,
"status": "The job is now running!",
"expectedState": null,
"expectedDifferentState": null,
"newTimeAllocation": null,
"allowRestart": null,
"newMounts": null,
"timestamp": 0
}
}
]
}'
# {
# }
# 📝 Note: The timestamp field will be filled out by UCloud/Core
# ~ Some time later ~
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/api/jobs/control/update" -d '{
"items": [
{
"id": "54112",
"update": {
"state": "SUCCESS",
"outputFolder": null,
"status": "The job has finished processing!",
"expectedState": null,
"expectedDifferentState": null,
"newTimeAllocation": null,
"allowRestart": null,
"newMounts": null,
"timestamp": 0
}
}
]
}'
# {
# }
Communication Flow: Visual
Example: Accounting¶
Frequency of use | Common |
---|---|
Pre-conditions |
|
Actors |
|
Communication Flow: Kotlin
/* In this example, we show how a Provider can implement accounting. Accounting is done, periodically,
by the provider in a background process. We recommend that Providers combine this with the same
background processing required for state changes. */
/* You should read understand how Products work in UCloud. UCloud supports multiple ways of accounting
for usage. The most normal one, which we show here, is the `CREDITS_PER_MINUTE` policy. This policy
requires that a Provider charges credits (1 credit = 1/1_000_000 DKK) for every minute of usage. */
/* We assume that the Provider has just determined that Jobs "51231" (single replica) and "63489"
(23 replicas) each have used 15 minutes of compute time since last accounting iteration. */
JobsControl.chargeCredits.call(
bulkRequestOf(ResourceChargeCredits(
chargeId = "51231-charge-04-oct-2021-12:30",
description = null,
id = "51231",
performedBy = null,
periods = 1,
units = 15,
), ResourceChargeCredits(
chargeId = "63489-charge-04-oct-2021-12:30",
description = null,
id = "63489",
performedBy = null,
periods = 23,
units = 15,
)),
provider
).orThrow()
/*
ResourceChargeCreditsResponse(
duplicateCharges = emptyList(),
insufficientFunds = emptyList(),
)
*/
/* 📝 Note: Because the ProductPriceUnit, of the Product associated with the Job, is
`CREDITS_PER_MINUTE` each unit corresponds to minutes of usage. A different ProductPriceUnit, for
example `CREDITS_PER_HOUR` would alter the definition of this unit. */
/* 📝 Note: The chargeId is an identifier which must be unique for any charge made by the Provider.
If the Provider makes a different charge request with this ID then the request will be ignored. We
recommend that Providers use this to their advantage and include, for example, a timestamp from
the last iteration. This means that you, as a Provider, cannot accidentally charge twice for the
same usage. */
/* In the next iteration, the Provider also determines that 15 minutes has passed for these Jobs. */
JobsControl.chargeCredits.call(
bulkRequestOf(ResourceChargeCredits(
chargeId = "51231-charge-04-oct-2021-12:45",
description = null,
id = "51231",
performedBy = null,
periods = 1,
units = 15,
), ResourceChargeCredits(
chargeId = "63489-charge-04-oct-2021-12:45",
description = null,
id = "63489",
performedBy = null,
periods = 23,
units = 15,
)),
provider
).orThrow()
/*
ResourceChargeCreditsResponse(
duplicateCharges = emptyList(),
insufficientFunds = listOf(FindByStringId(
id = "63489",
)),
)
*/
/* However, this time UCloud has told us that 63489 no longer has enough credits to pay for this.
The Provider should respond to this by immediately cancelling the Job, UCloud/Core does not perform
this step for you! */
/* 📝 Note: This request should be triggered by the normal life-cycle handler. */
JobsControl.update.call(
bulkRequestOf(ResourceUpdateAndId(
id = "63489",
update = JobUpdate(
allowRestart = null,
expectedDifferentState = null,
expectedState = null,
newMounts = null,
newTimeAllocation = null,
outputFolder = null,
state = JobState.SUCCESS,
status = "The job was terminated (No credits)",
timestamp = 0,
),
)),
provider
).orThrow()
/*
Unit
*/
Communication Flow: Curl
# ------------------------------------------------------------------------------------------------------
# $host is the UCloud instance to contact. Example: 'http://localhost:8080' or 'https://cloud.sdu.dk'
# $accessToken is a valid access-token issued by UCloud
# ------------------------------------------------------------------------------------------------------
# In this example, we show how a Provider can implement accounting. Accounting is done, periodically,
# by the provider in a background process. We recommend that Providers combine this with the same
# background processing required for state changes.
# You should read understand how Products work in UCloud. UCloud supports multiple ways of accounting
# for usage. The most normal one, which we show here, is the `CREDITS_PER_MINUTE` policy. This policy
# requires that a Provider charges credits (1 credit = 1/1_000_000 DKK) for every minute of usage.
# We assume that the Provider has just determined that Jobs "51231" (single replica) and "63489"
# (23 replicas) each have used 15 minutes of compute time since last accounting iteration.
# Authenticated as provider
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/api/jobs/control/chargeCredits" -d '{
"items": [
{
"id": "51231",
"chargeId": "51231-charge-04-oct-2021-12:30",
"units": 15,
"periods": 1,
"performedBy": null,
"description": null
},
{
"id": "63489",
"chargeId": "63489-charge-04-oct-2021-12:30",
"units": 15,
"periods": 23,
"performedBy": null,
"description": null
}
]
}'
# {
# "insufficientFunds": [
# ],
# "duplicateCharges": [
# ]
# }
# 📝 Note: Because the ProductPriceUnit, of the Product associated with the Job, is
# `CREDITS_PER_MINUTE` each unit corresponds to minutes of usage. A different ProductPriceUnit, for
# example `CREDITS_PER_HOUR` would alter the definition of this unit.
# 📝 Note: The chargeId is an identifier which must be unique for any charge made by the Provider.
# If the Provider makes a different charge request with this ID then the request will be ignored. We
# recommend that Providers use this to their advantage and include, for example, a timestamp from
# the last iteration. This means that you, as a Provider, cannot accidentally charge twice for the
# same usage.
# In the next iteration, the Provider also determines that 15 minutes has passed for these Jobs.
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/api/jobs/control/chargeCredits" -d '{
"items": [
{
"id": "51231",
"chargeId": "51231-charge-04-oct-2021-12:45",
"units": 15,
"periods": 1,
"performedBy": null,
"description": null
},
{
"id": "63489",
"chargeId": "63489-charge-04-oct-2021-12:45",
"units": 15,
"periods": 23,
"performedBy": null,
"description": null
}
]
}'
# {
# "insufficientFunds": [
# {
# "id": "63489"
# }
# ],
# "duplicateCharges": [
# ]
# }
# However, this time UCloud has told us that 63489 no longer has enough credits to pay for this.
# The Provider should respond to this by immediately cancelling the Job, UCloud/Core does not perform
# this step for you!
# 📝 Note: This request should be triggered by the normal life-cycle handler.
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/api/jobs/control/update" -d '{
"items": [
{
"id": "63489",
"update": {
"state": "SUCCESS",
"outputFolder": null,
"status": "The job was terminated (No credits)",
"expectedState": null,
"expectedDifferentState": null,
"newTimeAllocation": null,
"allowRestart": null,
"newMounts": null,
"timestamp": 0
}
}
]
}'
# {
# }
Communication Flow: Visual
Example: Ensuring UCloud/Core and Provider are in-sync¶
Frequency of use | Common |
---|---|
Pre-conditions |
|
Actors |
|
Communication Flow: Kotlin
/* In this example, we will explore the mechanism that UCloud/Core uses to ensure that the Provider
is synchronized with the core. */
/* UCloud/Core will periodically send the Provider a batch of active Jobs. If the Provider is unable
to recognize one or more of these Jobs, it should respond by updating the state of the affected
Job(s). */
JobsProvider.verify.call(
bulkRequestOf(Job(
createdAt = 1633329776235,
id = "54112",
output = null,
owner = ResourceOwner(
createdBy = "user",
project = null,
),
permissions = null,
specification = JobSpecification(
allowDuplicateJob = false,
application = NameAndVersion(
name = "acme-batch",
version = "1.0.0",
),
name = null,
openedFile = null,
parameters = mapOf("debug" to AppParameterValue.Bool(
value = true,
), "value" to AppParameterValue.Text(
value = "Hello, World!",
)),
product = ProductReference(
category = "example-compute",
id = "example-compute-1",
provider = "example",
),
replicas = 1,
resources = null,
restartOnExit = null,
sshEnabled = null,
timeAllocation = SimpleDuration(
hours = 1,
minutes = 0,
seconds = 0,
),
),
status = JobStatus(
allowRestart = false,
expiresAt = null,
jobParametersJson = null,
resolvedApplication = null,
resolvedProduct = null,
resolvedSupport = null,
startedAt = null,
state = JobState.RUNNING,
),
updates = emptyList(),
providerGeneratedId = "54112",
)),
ucloud
).orThrow()
/*
Unit
*/
/* In this case, the Provider does not recognize 54112 */
JobsControl.update.call(
bulkRequestOf(ResourceUpdateAndId(
id = "54112",
update = JobUpdate(
allowRestart = null,
expectedDifferentState = null,
expectedState = null,
newMounts = null,
newTimeAllocation = null,
outputFolder = null,
state = JobState.FAILURE,
status = "Your job is no longer available",
timestamp = 0,
),
)),
provider
).orThrow()
/*
Unit
*/
Communication Flow: Curl
# ------------------------------------------------------------------------------------------------------
# $host is the UCloud instance to contact. Example: 'http://localhost:8080' or 'https://cloud.sdu.dk'
# $accessToken is a valid access-token issued by UCloud
# ------------------------------------------------------------------------------------------------------
# In this example, we will explore the mechanism that UCloud/Core uses to ensure that the Provider
# is synchronized with the core.
# UCloud/Core will periodically send the Provider a batch of active Jobs. If the Provider is unable
# to recognize one or more of these Jobs, it should respond by updating the state of the affected
# Job(s).
# Authenticated as ucloud
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/ucloud/PROVIDERID/jobs/verify" -d '{
"items": [
{
"id": "54112",
"owner": {
"createdBy": "user",
"project": null
},
"updates": [
],
"specification": {
"application": {
"name": "acme-batch",
"version": "1.0.0"
},
"product": {
"id": "example-compute-1",
"category": "example-compute",
"provider": "example"
},
"name": null,
"replicas": 1,
"allowDuplicateJob": false,
"parameters": {
"debug": {
"type": "boolean",
"value": true
},
"value": {
"type": "text",
"value": "Hello, World!"
}
},
"resources": null,
"timeAllocation": {
"hours": 1,
"minutes": 0,
"seconds": 0
},
"openedFile": null,
"restartOnExit": null,
"sshEnabled": null
},
"status": {
"state": "RUNNING",
"jobParametersJson": null,
"startedAt": null,
"expiresAt": null,
"resolvedApplication": null,
"resolvedSupport": null,
"resolvedProduct": null,
"allowRestart": false
},
"createdAt": 1633329776235,
"output": null,
"permissions": null
}
]
}'
# {
# }
# In this case, the Provider does not recognize 54112
# Authenticated as provider
curl -XPOST -H "Authorization: Bearer $accessToken" -H "Content-Type: content-type: application/json; charset=utf-8" "$host/api/jobs/control/update" -d '{
"items": [
{
"id": "54112",
"update": {
"state": "FAILURE",
"outputFolder": null,
"status": "Your job is no longer available",
"expectedState": null,
"expectedDifferentState": null,
"newTimeAllocation": null,
"allowRestart": null,
"newMounts": null,
"timestamp": 0
}
}
]
}'
# {
# }
Communication Flow: Visual
Remote Procedure Calls¶
follow
¶
Follow the progress of a job
Request | Response | Error |
---|---|---|
JobsProviderFollowRequest |
JobsProviderFollowResponse |
CommonErrorMessage |
Implementation requirements:
For more information, see the end-user API (jobs.follow
)
retrieveProducts
¶
Retrieve product support for this provider
Request | Response | Error |
---|---|---|
Unit |
BulkResponse<ComputeSupport> |
CommonErrorMessage |
This endpoint responds with the Product
s supported by
this provider along with details for how Product
is
supported. The Product
s must be registered with
UCloud/Core already.
retrieveUtilization
¶
Retrieve information about how busy the provider’s cluster currently is
Request | Response | Error |
---|---|---|
JobsProviderUtilizationRequest |
JobsRetrieveUtilizationResponse |
CommonErrorMessage |
Implementation requirements:
For more information, see the end-user API (jobs.retrieveUtilization
)
create
¶
Creates one or more resources
Request | Response | Error |
---|---|---|
BulkRequest<Job> |
BulkResponse<FindByStringId> |
CommonErrorMessage |
Implementation requirements:
For more information, see the end-user API (jobs.create
)
extend
¶
Extend the duration of one or more jobs
Request | Response | Error |
---|---|---|
BulkRequest<JobsProviderExtendRequestItem> |
BulkResponse<Unit> |
CommonErrorMessage |
Implementation requirements:
For more information, see the end-user API (jobs.extend
)
init
¶
Request from the user to (potentially) initialize any resources
Request | Response | Error |
---|---|---|
ResourceInitializationRequest |
Unit |
CommonErrorMessage |
This request is sent by the client, if the client believes that initialization of resources might be needed. NOTE: This request might be sent even if initialization has already taken place. UCloud/Core does not check if initialization has already taken place, it simply validates the request.
openInteractiveSession
¶
Opens an interactive session (e.g. terminal, web or VNC)
Request | Response | Error |
---|---|---|
BulkRequest<JobsProviderOpenInteractiveSessionRequestItem> |
BulkResponse<OpenSession> |
CommonErrorMessage |
Implementation requirements:
For more information, see the end-user API (jobs.openInteractiveSession
)
suspend
¶
Suspend a job
Request | Response | Error |
---|---|---|
BulkRequest<JobsProviderSuspendRequestItem> |
BulkResponse<Unit> |
CommonErrorMessage |
Implementation requirements:
For more information, see the end-user API (jobs.suspend
)
terminate
¶
Request job cancellation and destruction
Request | Response | Error |
---|---|---|
BulkRequest<Job> |
BulkResponse<Unit> |
CommonErrorMessage |
Implementation requirements: Mandatory
For more information, see the end-user API (jobs.terminate
)
unsuspend
¶
Unsuspends a job
Request | Response | Error |
---|---|---|
BulkRequest<JobsProviderSuspendRequestItem> |
BulkResponse<Unit> |
CommonErrorMessage |
Implementation requirements:
For more information, see the end-user API (jobs.unsuspend
)
updateAcl
¶
Callback received by the Provider when permissions are updated
Request | Response | Error |
---|---|---|
BulkRequest<UpdatedAclWithResource<Job>> |
BulkResponse<Unit> |
CommonErrorMessage |
This endpoint is mandatory for Providers to implement. If the Provider does not need to keep
internal state, then they may simply ignore this request by responding with 200 OK
. The
Provider MUST reply with an OK status. UCloud/Core will fail the request if the Provider does
not acknowledge the request.
verify
¶
Invoked by UCloud/Core to trigger verification of a single batch
Request | Response | Error |
---|---|---|
BulkRequest<Job> |
Unit |
CommonErrorMessage |
This endpoint is periodically invoked by UCloud/Core for resources which are deemed active. The Provider should immediately determine if these are still valid and recognized by the Provider. If any of the resources are not valid, then the Provider should notify UCloud/Core by issuing an update for each affected resource.
Data Models¶
JobsProviderExtendRequestItem
¶
A request to extend the timeAllocation of a Job
data class JobsProviderExtendRequestItem(
val job: Job,
val requestedTime: SimpleDuration,
)
Properties
job
: Job
The affected Job
Job
requestedTime
: SimpleDuration
The requested extension, it will be added to the current timeAllocation
SimpleDuration
JobsProviderFollowRequest
¶
A request to start/stop a follow session
sealed class JobsProviderFollowRequest {
class CancelStream : JobsProviderFollowRequest()
class Init : JobsProviderFollowRequest()
}
JobsProviderFollowRequest.CancelStream
¶
Stop an existing follow session for a given Job
data class CancelStream(
val streamId: String,
val type: String /* "cancel" */,
)
JobsProviderFollowRequest.Init
¶
Start a new follow session for a given Job
data class Init(
val job: Job,
val type: String /* "init" */,
)
JobsProviderOpenInteractiveSessionRequestItem
¶
A request for opening a new interactive session (e.g. terminal)
data class JobsProviderOpenInteractiveSessionRequestItem(
val job: Job,
val rank: Int,
val sessionType: InteractiveSessionType,
)
Properties
job
: Job
The fully resolved Job
Job
rank
: Int
The rank of the node (0-indexed)
Int
Valid values range from 0 (inclusive) until specification.replicas
(exclusive)
sessionType
: InteractiveSessionType
The type of session
InteractiveSessionType
JobsProviderSuspendRequestItem
¶
data class JobsProviderSuspendRequestItem(
val job: Job,
)
Properties
job
: Job
Job
JobsProviderUtilizationRequest
¶
data class JobsProviderUtilizationRequest(
val categoryId: String,
)
Properties
categoryId
: String
String
JobsProviderFollowResponse
¶
A message emitted by the Provider in a follow session
data class JobsProviderFollowResponse(
val streamId: String,
val rank: Int,
val stdout: String?,
val stderr: String?,
)
Properties
streamId
: String
A unique ID for this follow session, the same identifier should be used for the entire session
String
We recommend that Providers generate a UUID or similar for this ID.
rank
: Int
The rank of the node (0-indexed)
Int
Valid values range from 0 (inclusive) until specification.replicas
(exclusive)
stdout
: String?
New messages from stdout (if any)
String?
The bytes from stdout, of the running process, should be interpreted as UTF-8. If the stream contains invalid bytes then these should be ignored and skipped.
See https://linux.die.net/man/3/stdout for more information.
stderr
: String?
New messages from stderr (if any)
String?
The bytes from stdout, of the running process, should be interpreted as UTF-8. If the stream contains invalid bytes then these should be ignored and skipped.
See https://linux.die.net/man/3/stderr for more information.