Synopsis
sc rag [-hV] [--etl=TARGET] [-o=OUTPUT_FILE] DOCUMENT
Description
Interact with the RAG (Retrieval-Augmented Generation) system.
Process documents and perform ETL operations for enhanced AI responses.
Supports local files and remote HTTPS resources with multiple formats
including PDF
, Markdown
, JSON
, and HTML
documents.
- ETL Operations
-
-
file - Extract, process store documents in a txt file
-
vectorStore - Store document embeddings in a vector database
-
- Examples
-
-
sc rag file:///home/user/document.pdf --output=summary.txt
-
sc rag https://example.com/article.html --etl=vectorStore
-
sc rag file:///docs/manual.md --output=summary.txt --etl=file
-
- Supported Document Types
-
-
PDF files (.pdf)
-
Markdown files (.md, .markdown)
-
HTML web pages (.html, .htm)
-
JSON files (.json)
-
Plain text files (.txt)
-
- Supported Protocols
-
-
file:// Local file system
-
https:// Remote HTTPS resources (secure only)
-
github:// GitHub repository files
-
Overview
The RAG (Retrieval-Augmented Generation) command processes documents and extracts relevant information to enhance AI conversations. It supports various document formats and can output processed content to files or vector stores for later retrieval.
Usage Modes
File Processing
Process documents and save the extracted content to a text file:
sc rag file:///document.pdf -o output.txt
sc rag https://example.com/article.html -o extracted.txt
Vector Store Processing
Process documents directly into a vector store for enhanced retrieval:
sc rag https://api.github.com/repos/user/repo --etl=vectorStore
sc rag file:///absolute/path/to/data.json --etl=vectorStore
Supported Formats
Document Types
-
PDF files:
.pdf
-
Markdown files:
.md
,.markdown
-
Text files:
.txt
-
HTML content:
.html
,.htm
-
JSON data:
.json
Input Sources
-
Local files:
file:///path/to/document.pdf
-
HTTP/HTTPS URLs:
https://example.com/document.html
-
GitHub:
github://username/repository/contents/file-path
Examples
Process a local PDF to text file:
sc rag file:///home/user/documents/manual.pdf -o extracted.txt
Extract content from a web page:
sc rag https://docs.spring.io/spring-ai/reference/index.html -o spring-ai-docs.txt
Load documents into vector store:
sc rag github://spring-projects/spring-framework/contents/README.md --etl=vectorStore
Process multiple document types:
sc rag file:///research-paper.pdf --etl=vectorStore
sc rag file:///project-documentation.md -o summary.txt
sc rag https://docs.spring.io/spring-ai/reference/index.html --etl=vectorStore
Processing Pipeline
The RAG command uses a sophisticated processing pipeline:
-
Document Loading: Supports multiple formats and sources
-
Content Extraction: Extracts text while preserving structure
-
Text Splitting: Breaks content into manageable chunks
-
Embedding Generation: Creates vector embeddings for semantic search
-
Storage: Saves to files or vector stores based on output type
Integration with Chat
Documents processed with --etl=vectorStore
become available for enhanced chat sessions:
# Process documentation
sc rag https://docs.spring.io/spring-boot/index.html --etl=vectorStore
# Now chat with enhanced context
sc chat "How do I configure Spring Boot actuators?"
The chat command automatically retrieves relevant information from processed documents to provide more accurate and contextual responses.
Options
- --etl=TARGET
-
ETL operation target specifying how to process the document.
-
file - Extract content and optionally save to output file
-
vectorStore - Process and store embeddings in a vector database
Default: file
-
- -h, --help
-
Show this help message and exit.
- -o, --output=OUTPUT_FILE
-
Output filename for the RAG response.
Must be used with '--etl=file' operation.
Saves processed content to specified file.
Example: --output=summary.txt
- -V, --version
-
Print version information and exit.
Arguments
- DOCUMENT
-
The document to process using one of the supported protocols:
- Local Files
- GitHub Files
-
-
github://user/repo/contents/path/to/file
-
github://user/repo/contents/path/to/another/file
-
- Remote Files (HTTPS only)
-
-
https://github.com/user/repo/raw/main/README.md
Note: Only HTTPS URLs are supported for security reasons.
Error Handling
Common issues and solutions:
- File not found
-
Verify the file path and ensure the file exists
- Unsupported format
-
Check that the file extension is supported
- Network errors
-
Verify URL accessibility and network connectivity
- Permission denied
-
Ensure read permissions for input files and write permissions for output directory
See Also
sc(1), sc-chat(1), sc-config(1)