Synopsis

sc rag [-hV] [--etl=TARGET] [-o=OUTPUT_FILE] DOCUMENT

Description

Interact with the RAG (Retrieval-Augmented Generation) system. Process documents and perform ETL operations for enhanced AI responses. Supports local files and remote HTTPS resources with multiple formats including PDF, Markdown, JSON, and HTML documents.

ETL Operations
  • file - Extract, process store documents in a txt file

  • vectorStore - Store document embeddings in a vector database

Examples
Supported Document Types
  • PDF files (.pdf)

  • Markdown files (.md, .markdown)

  • HTML web pages (.html, .htm)

  • JSON files (.json)

  • Plain text files (.txt)

Supported Protocols
  • file:// Local file system

  • https:// Remote HTTPS resources (secure only)

  • github:// GitHub repository files

Overview

The RAG (Retrieval-Augmented Generation) command processes documents and extracts relevant information to enhance AI conversations. It supports various document formats and can output processed content to files or vector stores for later retrieval.

Usage Modes

File Processing

Process documents and save the extracted content to a text file:

sc rag file:///document.pdf -o output.txt
sc rag https://example.com/article.html -o extracted.txt

Vector Store Processing

Process documents directly into a vector store for enhanced retrieval:

sc rag https://api.github.com/repos/user/repo --etl=vectorStore
sc rag file:///absolute/path/to/data.json --etl=vectorStore

Supported Formats

Document Types

  • PDF files: .pdf

  • Markdown files: .md, .markdown

  • Text files: .txt

  • HTML content: .html, .htm

  • JSON data: .json

Input Sources

Examples

Process a local PDF to text file:

sc rag file:///home/user/documents/manual.pdf -o extracted.txt

Extract content from a web page:

sc rag https://docs.spring.io/spring-ai/reference/index.html -o spring-ai-docs.txt

Load documents into vector store:

sc rag github://spring-projects/spring-framework/contents/README.md --etl=vectorStore

Process multiple document types:

sc rag file:///research-paper.pdf --etl=vectorStore
sc rag file:///project-documentation.md -o summary.txt
sc rag https://docs.spring.io/spring-ai/reference/index.html --etl=vectorStore

Processing Pipeline

The RAG command uses a sophisticated processing pipeline:

  1. Document Loading: Supports multiple formats and sources

  2. Content Extraction: Extracts text while preserving structure

  3. Text Splitting: Breaks content into manageable chunks

  4. Embedding Generation: Creates vector embeddings for semantic search

  5. Storage: Saves to files or vector stores based on output type

Integration with Chat

Documents processed with --etl=vectorStore become available for enhanced chat sessions:

# Process documentation
sc rag https://docs.spring.io/spring-boot/index.html --etl=vectorStore

# Now chat with enhanced context
sc chat "How do I configure Spring Boot actuators?"

The chat command automatically retrieves relevant information from processed documents to provide more accurate and contextual responses.

Options

--etl=TARGET

ETL operation target specifying how to process the document.

  • file - Extract content and optionally save to output file

  • vectorStore - Process and store embeddings in a vector database

    Default: file

-h, --help

Show this help message and exit.

-o, --output=OUTPUT_FILE

Output filename for the RAG response.

Must be used with '--etl=file' operation.

Saves processed content to specified file.

Example: --output=summary.txt

-V, --version

Print version information and exit.

Arguments

DOCUMENT

The document to process using one of the supported protocols:

Local Files
GitHub Files
  • github://user/repo/contents/path/to/file

  • github://user/repo/contents/path/to/another/file

Remote Files (HTTPS only)

Error Handling

Common issues and solutions:

File not found

Verify the file path and ensure the file exists

Unsupported format

Check that the file extension is supported

Network errors

Verify URL accessibility and network connectivity

Permission denied

Ensure read permissions for input files and write permissions for output directory

See Also

sc(1), sc-chat(1), sc-config(1)