RAGFlow

type access

  • Operating System:

  • Terminal:

  • Shell:

  • Editor:

  • Package Manager:

  • Programming Language:

  • Database:

  • Extension:

RAGflow is an open-source retrieval-augmented generation (RAG) platform designed to manage data pipelines, knowledge bases, and large-language-model integration. It supports Hugging Face models, Ollama and GPU acceleration via vLLM.

More information about RAGFlow can be found in the official documentation.

Initialization

For information on how to use the Initialization parameter, please refer to the Initialization - Bash script section of the documentation.

Signup and Login

After the job starts, open the RAGFlow web interface. Registration is required for the first login, by clicking on Sign up at the bottom of the page.

Note

All users are registered on the server with an email address. However, there is no email server configured in the backend, so it is not possible to send emails to users from the app's web interface.

Important

By default, the registered user does not have admin privileges.

RAGFlow Services

The application initiates several services upon startup, including Redis, MySQL, Elasticsearch, Minio.

Important

For the services to start, connect and work correctly, it is necessary to select a machine with RAM ≥ 16 GB.

Please wait while the app finishes loading which may take a few minutes.

Data Directory Structure

Upon the initial launch, users are prompted to import a directory from UCloud. If the directory is empty, the app will automatically create a structured folder system for storing models, caches, configurations, and more:

my_data_volume/
├── cache
├── es
├── logs
│   ├── backend.log
│   ├── backend_admin.log
│   ├── minio.log
│   ├── ollama.log
│   ├── redis_log.txt
│   └── te_0.log
├── minio
├── models
│   ├── blobs
│   └── manifests
├── mysql
│   ├── mysql
│   ├── rag_flow
│   └── ...
└── redis
│   └── redis.conf

For convenience, the path to the imported data volume is stored in the DATA_DIR environment variable.

Note

Avoid spaces in the directory name. Including them will result in errors during startup when the directory tree is created.

Admin Privileges

By default, accounts created through the Sign Up page are assigned a standard user role. Superuser privileges grant access to the admin page, where user accounts can be modified or added.

To grant a user superuser (admin) privileges, the role must be updated directly in the database. This can be done via the following steps:

  • Open the terminal by clicking on the blue button at the top of the RAGFlow job progress view.

  • Open the MySQL command-line client as root:

    $ mysql -h127.0.0.1 -P $MYSQL_PORT -uroot -pinfini_rag_flow
    
  • Switch to the RAGFlow database and upgrade the user role:

    USE rag_flow;
    UPDATE user SET is_superuser = 1 WHERE email = 'user-email';
    

    where user-email is the email address used during registration.

Add Members

To collaborate within the same RAGFlow instance, the application must be started with an attached public link (see: Configure custom links).

The link can be shared with the collaborators. Opening the link leads to the RAGFlow login page, where the collaborators can sign up.

After they sign up, it is possible to add the collaborator's email in the profile settings by clicking the logo in the top-right corner, under Team. Team members can upload and parse documents in shared datasets and use shared agents (see more on RAGFlow documentation)

Note

To prevent unauthorized sign-ups, use the Disable signup option, recommended if sharing the application through a custom public link. When the Disable signup option is selected, new members can be added only through the admin page.

Add a new member through the admin page

A new member can be added via the admin page, via the following steps:

  • Open the Admin page by appending /admin to your RAGFlow URL, for example: app-mylink.cloud.sdu.dk/admin.

  • Create the collaborator’s account in the Admin panel.

  • Add them to the team:

    • Click on the profile/logo in the top-right corner

    • Navigate to Team

    • Add the collaborator’s registered email

Adding and Configuring LLMs

When submitting a RAGFlow job, several optional parameters allow you to configure and pre-load LLMs.

Select Ollama models

This option allows you to download or load specific Ollama models before the job starts.

The models are automatically downloaded at startup, so no manual ollama pull is required. They can then be configured in the Model Providers section (described below).

Multiple models can be specified by separating them with commas:

llama3.2:3b,all-minilm:22m

A full list of available Ollama models can be found here.

Import Ollama models

This option allow you to specify the path to an existing directory containing Ollama model files.

Max loaded Ollama models

This option controls how many Ollama models can remain loaded in memory simultaneously and corresponds to the OLLAMA_MAX_LOADED_MODELS environment variable. A higher value allows faster switching between models. A lower value reduces memory usage. The default value, OLLAMA_MAX_LOADED_MODELS=1, is sufficient for most use cases.

Note

This setting does not control which models are available — only how many can be active in memory at the same time.

Download Ollama Models from Terminal

Models can be downloaded directly via the terminal app by using the Ollama API.

Open the terminal by clicking on the blue button at the top of the job progress view, and write:

ollama pull llama3.3:70b

Tip

pulling manifest
pulling 4824460d29f2... 100% ▕████████████████████████████████████████▏ 42 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████▏ 1.5 KB
pulling bc371a43ce90... 100% ▕████████████████████████████████████████▏ 7.6 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████▏ 96 B
pulling c7091aa45e9b... 100% ▕████████████████████████████████████████▏ 562 B
verifying sha256 digest
writing manifest
success

By default, models are stored within the imported data volume as shown here:

my_data_volume/models/
├── blobs
│   ├── sha256-...
│   └── ...
└── manifests
    └── registry.ollama.ai
        └── library
            └── llama3.2
                └── 3b

The user can specify a different directory for models using the Import Ollama models optional parameter.

Integration via model providers

To integrate the Ollama models via the RAGFlow UI, click on your profile logo in the top-right corner of the page to go to user settings, and select Model providers. Select Ollama from the Available models list, at the right side of the page. In the popup:

  • Select the Ollama model type and enter the name. For example:
    deepseek-r1:1.5b or llama3.2:latest for chat model and
    all-minilm:22m or bge-m3:latest for embedding model.

  • Add the base URL, i.e. http://0.0.0.0:11434.

  • Add the Max tokens number.

  • Press OK to add the model.

Now it is possible to set the model as a default model.

Download Hugging Face Models from Terminal

Open the terminal by clicking on the blue button at the top of the job progress view, and write:

$ hf download OpenGVLab/InternVL3_5-1B

Start the vLLM server:

$ vllm serve OpenGVLab/InternVL3_5-1B --trust-remote-code --enforce-eager

Note

The vLLM server runs as a foreground process and must be started every new session.

Integration via model providers

To integrate the HF model via the RAGFlow UI, click on your profile logo in the top-right corner of the page to go to user settings, and select Model providers. Select HuggingFace from the Available models list, at the right side of the page. In the popup:

  • Select the model type based on the loaded model, for the example above chat.

  • Include the full model identifier OpenGVLab/InternVL3_5-1B for the model name.

  • Provide the default base URL: http://0.0.0.0:8000/v1

Document parsing

When documents are uploaded, RAGFlow splits them into smaller chunks before generating embeddings. Each embedding model has a maximum context window (token limit).

An error occurs if a chunk exceeds the token limit of the selected embedding model or LLM. To fix this:

  • Use an embedding model or LLM with a larger maximum context window.

  • Reduce the chunk size in the ingestion pipeline settings. The ingestion pipeline settings can be found by clicking on the parsing method (e.g., general, manual, book).

  • In most cases, smaller chunk sizes resolve the issue and ensure compatibility with the embedding model’s context window.

GPU Usage Guidelines

Using a GPU significantly improves response time when chatting with larger LLMs.

Use a GPU if:

  • Running medium or large models (e.g., 7B+)

  • Using vLLM

  • Serving multiple users

  • Experiencing slow responses on CPU

CPU is sufficient for:

  • Small models (1B–3B)

  • Embedding models

  • Testing and light usage

Important

Ensure the GPU has enough VRAM (GB) for the selected model (size in GB), otherwise it may fail to load or fall back to CPU.