• Prepares a directory for embedding by chunking all files and calculating statistics.

    This is the recommended high-level function for preparing a directory structure for embedding operations. It combines chunking and statistics calculation into a single operation, providing both the chunked data and insights about the scope of the embedding task (token count, estimated cost, etc.).

    Parameters

    • rootPath: string

      Absolute path to the root directory to process

    • options: DirectoryStructureOptions & {
          delimiter?: RegExp;
      } & ChunkStatsOptions = {}

      Options for directory traversal, chunking, and cost estimation

    Returns DirectoryEmbeddingPreparation

    Object containing the chunked directory structure and statistics

    // Prepare a directory for embedding with default settings
    const { chunkedDirectory, stats } = prepare_directory_embedding('/path/to/project');
    console.log(`Ready to embed ${stats.numChunks} chunks from ${stats.numFiles} files`);
    console.log(`Estimated cost: $${stats.estimatedCost.toFixed(4)}`);
    // Prepare TypeScript files only with custom pricing
    const result = prepare_directory_embedding('/path/to/src', {
    fileExtensions: ['.ts', '.tsx'],
    delimiter: /\n{3,}/g,
    charsPerToken: 4,
    costPerToken: 0.00002 / 1000 // OpenAI text-embedding-3-small pricing
    });
    console.log(`Files: ${result.stats.numFiles}`);
    console.log(`Chunks: ${result.stats.numChunks}`);
    console.log(`Tokens: ${result.stats.estimatedTokens}`);
    console.log(`Cost: $${result.stats.estimatedCost.toFixed(6)}`);
    // Use the chunked directory for embedding
    const { chunkedDirectory, stats } = prepare_directory_embedding('/path/to/project');
    // Iterate through chunks and embed them
    for (const node of chunkedDirectory.children) {
    if (node.type === 'file') {
    for (const chunk of node.chunks) {
    await embedChunk(chunk.content, chunk.filePath, chunk.startLine);
    }
    }
    }