Absolute path to the root directory to process
Options for directory traversal, chunking, and cost estimation
Object containing the chunked directory structure and statistics
// Prepare a directory for embedding with default settings
const { chunkedDirectory, stats } = prepare_directory_embedding('/path/to/project');
console.log(`Ready to embed ${stats.numChunks} chunks from ${stats.numFiles} files`);
console.log(`Estimated cost: $${stats.estimatedCost.toFixed(4)}`);
// Prepare TypeScript files only with custom pricing
const result = prepare_directory_embedding('/path/to/src', {
fileExtensions: ['.ts', '.tsx'],
delimiter: /\n{3,}/g,
charsPerToken: 4,
costPerToken: 0.00002 / 1000 // OpenAI text-embedding-3-small pricing
});
console.log(`Files: ${result.stats.numFiles}`);
console.log(`Chunks: ${result.stats.numChunks}`);
console.log(`Tokens: ${result.stats.estimatedTokens}`);
console.log(`Cost: $${result.stats.estimatedCost.toFixed(6)}`);
// Use the chunked directory for embedding
const { chunkedDirectory, stats } = prepare_directory_embedding('/path/to/project');
// Iterate through chunks and embed them
for (const node of chunkedDirectory.children) {
if (node.type === 'file') {
for (const chunk of node.chunks) {
await embedChunk(chunk.content, chunk.filePath, chunk.startLine);
}
}
}
Prepares a directory for embedding by chunking all files and calculating statistics.
This is the recommended high-level function for preparing a directory structure for embedding operations. It combines chunking and statistics calculation into a single operation, providing both the chunked data and insights about the scope of the embedding task (token count, estimated cost, etc.).