RAGFlow¶
RAGflow is an open-source retrieval-augmented generation (RAG) platform designed to manage data pipelines, knowledge bases, and large-language-model integration. It supports Hugging Face models, Ollama and GPU acceleration via vLLM.
More information about RAGFlow can be found in the official documentation.
Initialization¶
For information on how to use the Initialization parameter, please refer to the Initialization - Bash script section of the documentation.
Signup and Login¶
After the job starts, open the RAGFlow web interface. Registration is required for the first login, by clicking on Sign up at the bottom of the page.
Note
All users are registered on the server with an email address. However, there is no email server configured in the backend, so it is not possible to send emails to users from the app's web interface.
Important
By default, the registered user does not have admin privileges.
RAGFlow Services¶
The application initiates several services upon startup, including Redis, MySQL, Elasticsearch, Minio.
Important
For the services to start, connect and work correctly, it is necessary to select a machine with RAM ≥ 16 GB.
Please wait while the app finishes loading which may take a few minutes.
Data Directory Structure¶
Upon the initial launch, users are prompted to import a directory from UCloud. If the directory is empty, the app will automatically create a structured folder system for storing models, caches, configurations, and more:
my_data_volume/
├── cache
├── es
├── logs
│ ├── backend.log
│ ├── backend_admin.log
│ ├── minio.log
│ ├── ollama.log
│ ├── redis_log.txt
│ └── te_0.log
├── minio
├── models
│ ├── blobs
│ └── manifests
├── mysql
│ ├── mysql
│ ├── rag_flow
│ └── ...
└── redis
│ └── redis.conf
For convenience, the path to the imported data volume is stored in the DATA_DIR environment variable.
Note
Avoid spaces in the directory name. Including them will result in errors during startup when the directory tree is created.
Admin Privileges¶
By default, accounts created through the Sign Up page are assigned a standard user role. Superuser privileges grant access to the admin page, where user accounts can be modified or added.
To grant a user superuser (admin) privileges, the role must be updated directly in the database. This can be done via the following steps:
Open the terminal by clicking on the blue button at the top of the RAGFlow job progress view.
Open the MySQL command-line client as root:
$ mysql -h127.0.0.1 -P $MYSQL_PORT -uroot -pinfini_rag_flow
Switch to the RAGFlow database and upgrade the user role:
USE rag_flow; UPDATE user SET is_superuser = 1 WHERE email = 'user-email';
where
user-emailis the email address used during registration.
Add Members¶
To collaborate within the same RAGFlow instance, the application must be started with an attached public link (see: Configure custom links).
The link can be shared with the collaborators. Opening the link leads to the RAGFlow login page, where the collaborators can sign up.
After they sign up, it is possible to add the collaborator's email in the profile settings by clicking the logo in the top-right corner, under Team. Team members can upload and parse documents in shared datasets and use shared agents (see more on RAGFlow documentation)
Note
To prevent unauthorized sign-ups, use the Disable signup option, recommended if sharing the application through a custom public link. When the Disable signup option is selected, new members can be added only through the admin page.
Add a new member through the admin page¶
A new member can be added via the admin page, via the following steps:
Open the Admin page by appending
/adminto your RAGFlow URL, for example:app-mylink.cloud.sdu.dk/admin.Create the collaborator’s account in the Admin panel.
Add them to the team:
Click on the profile/logo in the top-right corner
Navigate to
TeamAdd the collaborator’s registered email
Adding and Configuring LLMs¶
When submitting a RAGFlow job, several optional parameters allow you to configure and pre-load LLMs.
Select Ollama models¶
This option allows you to download or load specific Ollama models before the job starts.
The models are automatically downloaded at startup, so no manual ollama pull is required. They can then be configured in the Model Providers section (described below).
Multiple models can be specified by separating them with commas:
llama3.2:3b,all-minilm:22m
A full list of available Ollama models can be found here.
Import Ollama models¶
This option allow you to specify the path to an existing directory containing Ollama model files.
Max loaded Ollama models¶
This option controls how many Ollama models can remain loaded in memory simultaneously and corresponds to the OLLAMA_MAX_LOADED_MODELS environment variable. A higher value allows faster switching between models. A lower value reduces memory usage. The default value, OLLAMA_MAX_LOADED_MODELS=1, is sufficient for most use cases.
Note
This setting does not control which models are available — only how many can be active in memory at the same time.
Download Ollama Models from Terminal¶
Models can be downloaded directly via the terminal app by using the Ollama API.
Open the terminal by clicking on the blue button at the top of the job progress view, and write:
ollama pull llama3.3:70b
Tip
pulling manifest
pulling 4824460d29f2... 100% ▕████████████████████████████████████████▏ 42 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████▏ 1.5 KB
pulling bc371a43ce90... 100% ▕████████████████████████████████████████▏ 7.6 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████▏ 96 B
pulling c7091aa45e9b... 100% ▕████████████████████████████████████████▏ 562 B
verifying sha256 digest
writing manifest
success
By default, models are stored within the imported data volume as shown here:
my_data_volume/models/
├── blobs
│ ├── sha256-...
│ └── ...
└── manifests
└── registry.ollama.ai
└── library
└── llama3.2
└── 3b
The user can specify a different directory for models using the Import Ollama models optional parameter.
Integration via model providers¶
To integrate the Ollama models via the RAGFlow UI, click on your profile logo in the top-right corner of the page to go to user settings, and select Model providers. Select Ollama from the Available models list, at the right side of the page. In the popup:
Select the Ollama model type and enter the name. For example:
deepseek-r1:1.5borllama3.2:latestforchatmodel and
all-minilm:22morbge-m3:latestforembeddingmodel.Add the base URL, i.e.
http://0.0.0.0:11434.Add the Max tokens number.
Press OK to add the model.
Now it is possible to set the model as a default model.
Download Hugging Face Models from Terminal¶
Open the terminal by clicking on the blue button at the top of the job progress view, and write:
$ hf download OpenGVLab/InternVL3_5-1B
Start the vLLM server:
$ vllm serve OpenGVLab/InternVL3_5-1B --trust-remote-code --enforce-eager
Note
The vLLM server runs as a foreground process and must be started every new session.
Integration via model providers¶
To integrate the HF model via the RAGFlow UI, click on your profile logo in the top-right corner of the page to go to user settings, and select Model providers. Select HuggingFace from the Available models list, at the right side of the page. In the popup:
Select the model type based on the loaded model, for the example above chat.
Include the full model identifier
OpenGVLab/InternVL3_5-1Bfor the model name.Provide the default base URL:
http://0.0.0.0:8000/v1
Document parsing¶
When documents are uploaded, RAGFlow splits them into smaller chunks before generating embeddings. Each embedding model has a maximum context window (token limit).
An error occurs if a chunk exceeds the token limit of the selected embedding model or LLM. To fix this:
Use an embedding model or LLM with a larger maximum context window.
Reduce the chunk size in the ingestion pipeline settings. The ingestion pipeline settings can be found by clicking on the parsing method (e.g.,
general,manual,book).In most cases, smaller chunk sizes resolve the issue and ensure compatibility with the embedding model’s context window.
GPU Usage Guidelines¶
Using a GPU significantly improves response time when chatting with larger LLMs.
Use a GPU if:
Running medium or large models (e.g., 7B+)
Using vLLM
Serving multiple users
Experiencing slow responses on CPU
CPU is sufficient for:
Small models (1B–3B)
Embedding models
Testing and light usage
Important
Ensure the GPU has enough VRAM (GB) for the selected model (size in GB), otherwise it may fail to load or fall back to CPU.
Contents