Label Studio

Label-Studio type access

  • Operating System:

  • Terminal:

  • Shell:

  • Editor:

  • Package Manager:

  • Programming Language:

  • Utility:

  • Extension:

Label-Studio type access

  • Operating System:

  • Terminal:

  • Shell:

  • Editor:

  • Package Manager:

  • Programming Language:

  • Utility:

  • Extension:

Label Studio is an open-source data labeling tool that can handle different data types, such as audio, text, images, and HTML. It allows users to label, annotate, and explore their data. It also provides a machine learning interface that supports various training techniques, such as active learning and supervised learning.

For more information, check here.

Account information

Upon startup, the Label Studio web interface will prompt you to sign up or create a new user account.

Sign up

You will need to create an account when starting a new Label Studio project. For this, import an empty folder from your UCloud workspace using the mandatory --data-dir parameter.

When signing up, it is required that the username is an email address and that the password is at least 8 characters long. There is no email verification.

The login credentials can also be specified before submitting the job via the --username and --password optional parameters.

Note

The sign up page is disabled when the app is started with a public link.

Log in

You can login with the user credentials created in a previous session.

Database setup

Label Studio uses SQLite by default. You don’t need to configure anything. Label Studio stores all data in a single file inside the directory imported using the --data-dir option.

For more information, check here.

Import & export files

Files for labeling (e.g. images) can be imported using the Label Studio user interface. By default, files are uploaded directly from the user's computer.

To add files stored on a UCloud drive, import the correspoding directory, e.g. mydataset, within the application using the optional Select folder to use parameter. In Label Studio, after creating a project, go to the project settings, and under the Cloud Storage tab, select Add Source Storage or Add Target Storage. Then, select Local files as Storage Type and insert the Absolute local path, e.g. /work/mydataset. Finally, click on Check Connection and press Add Storage to import your data inside the project.

If there are files other than JSON files, activate the switch Treat every bucket object as a source file. In the File Filter Regex field, you can specify a regular expression to filter bucket objects. Use .* to collect all objects. After adding the storage, click on Sync Storage to collect tasks from the bucket. For more information, check here.

During export the data will be downloaded via the browser in a specified format. Additionally, a log entry is created in the export subfolder within the input directory.

ML backend

Label Studio ML backend is an SDK that allows users to start a server to integrate Label Studio with machine learning models.

Initialize the server

To initialize the server, open a terminal window inside the app and run the command:

$ label-studio-ml init --root-dir /work my_ml_backend

This will create a new directory /work/my_ml_backend with templates to define your ML backend model:

my_ml_backend/
├── Dockerfile
├── README.md
├── __pycache__
│   └── model.cpython-310.pyc
├── _wsgi.py
├── docker-compose.yml
├── model.py
└── requirements.txt

The model dependencies should be added to the requirements.txt file.

Note

These dependencies can be installed automatically before launching the Label Studio app by importing the requirements.txt file via the Initialization optional parameter.

For machine learning example tutorials check here.

Start the server

To start the ML backend server in backgorund, change to the ML backend directory and run the command:

$ gunicorn _wsgi:app --bind "0.0.0.0:9090" --workers 2 &

Load the model

Open the project settings in Label Studio and go to Machine Learning. Then, click on Add Model, specify the URL, e.g. http://0.0.0.0:9090, and press Validate and Save.

For more information on how to integrate Label Studio into your machine learning pipeline, check here.