Skip to content

Commit 3f39fac

Browse files
authored
Enhancements and Test fixes (#27)
1 parent 24042cf commit 3f39fac

38 files changed

+2048
-278
lines changed

.github/workflows/release-on-main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
- name: Run tests
3535
run: npm test
3636
# Temporary: skip tests for now
37-
continue-on-error: true
37+
#continue-on-error: true
3838

3939
# Optional: build if needed (for TypeScript, bundlers, etc.)
4040
- name: Build package

README.md

Lines changed: 116 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,22 +86,136 @@ Once added, your editor will be fully connected to your Astra DB database.
8686

8787
The server provides the following tools for interacting with Astra DB:
8888

89+
### Collection Management
8990
- `GetCollections`: Get all collections in the database
90-
- `CreateCollection`: Create a new collection in the database
91+
- `CreateCollection`: Create a new collection in the database (with vector support)
9192
- `UpdateCollection`: Update an existing collection in the database
9293
- `DeleteCollection`: Delete a collection from the database
94+
- `EstimateDocumentCount`: Get estimate of the number of documents in a collection
95+
96+
### Record Operations
9397
- `ListRecords`: List records from a collection in the database
9498
- `GetRecord`: Get a specific record from a collection by ID
9599
- `CreateRecord`: Create a new record in a collection
96100
- `UpdateRecord`: Update an existing record in a collection
97101
- `DeleteRecord`: Delete a record from a collection
98102
- `FindRecord`: Find records in a collection by field value
103+
- `FindDistinctValues`: Find distinct values for a specific field in a collection
104+
105+
### Bulk Operations
99106
- `BulkCreateRecords`: Create multiple records in a collection at once
100107
- `BulkUpdateRecords`: Update multiple records in a collection at once
101108
- `BulkDeleteRecords`: Delete multiple records from a collection at once
109+
110+
### Vector Search
111+
- `VectorSearch`: Perform vector similarity search on vector embeddings
112+
- `HybridSearch`: Combine vector similarity search with text search
113+
114+
### Utility
102115
- `OpenBrowser`: Open a web browser for authentication and setup
103116
- `HelpAddToClient`: Get assistance with adding Astra DB client to your MCP client
104-
- `EstimateDocumentCount`: Get estimate of the number of documents in a collection
117+
## New Features and Capabilities
118+
119+
### Vector Search Capabilities
120+
121+
The Astra DB MCP server now includes powerful vector search capabilities for AI applications:
122+
123+
#### VectorSearch
124+
125+
Perform similarity search on vector embeddings:
126+
127+
```javascript
128+
// Example usage
129+
const results = await VectorSearch({
130+
collectionName: "my_vector_collection",
131+
queryVector: [0.1, 0.2, 0.3, ...], // Your embedding vector
132+
limit: 5, // Optional: Number of results to return (default: 10)
133+
minScore: 0.7, // Optional: Minimum similarity score threshold
134+
filter: { category: "article" } // Optional: Additional filter criteria
135+
});
136+
```
137+
138+
#### HybridSearch
139+
140+
Combine vector similarity search with text search for more accurate results:
141+
142+
```javascript
143+
// Example usage
144+
const results = await HybridSearch({
145+
collectionName: "my_vector_collection",
146+
queryVector: [0.1, 0.2, 0.3, ...], // Your embedding vector
147+
textQuery: "climate change", // Text query to search for
148+
weights: { // Optional: Weights for hybrid search
149+
vector: 0.7, // Weight for vector similarity (0.0-1.0)
150+
text: 0.3 // Weight for text relevance (0.0-1.0)
151+
},
152+
limit: 5, // Optional: Number of results to return
153+
fields: ["title", "content"] // Optional: Fields to search in for text query
154+
});
155+
```
156+
157+
### Enhanced Collection Creation
158+
159+
The `CreateCollection` tool now supports more vector configuration options:
160+
161+
```javascript
162+
// Example usage
163+
const result = await CreateCollection({
164+
collectionName: "my_vector_collection",
165+
vector: true, // Enable vector search
166+
dimension: 1536, // Vector dimension (e.g., 1536 for OpenAI embeddings)
167+
metric: "cosine" // Similarity metric: "cosine", "euclidean", or "dot_product"
168+
});
169+
```
170+
171+
### Finding Distinct Values
172+
173+
The new `FindDistinctValues` tool allows you to find unique values for a field:
174+
175+
```javascript
176+
// Example usage
177+
const distinctValues = await FindDistinctValues({
178+
collectionName: "my_collection",
179+
field: "category", // Field to find distinct values for
180+
filter: { active: true } // Optional: Filter to apply
181+
});
182+
```
183+
184+
### Optimized Bulk Operations
185+
186+
Bulk operations now use native batch processing for better performance:
187+
188+
```javascript
189+
// Example: Bulk create records
190+
const result = await BulkCreateRecords({
191+
collectionName: "my_collection",
192+
records: [
193+
{ title: "Record 1", content: "Content 1" },
194+
{ title: "Record 2", content: "Content 2" },
195+
// ... more records
196+
]
197+
});
198+
199+
// Example: Bulk update records
200+
const updateResult = await BulkUpdateRecords({
201+
collectionName: "my_collection",
202+
records: [
203+
{ id: "record1", record: { title: "Updated Title 1" } },
204+
{ id: "record2", record: { title: "Updated Title 2" } },
205+
// ... more records
206+
]
207+
});
208+
209+
// Example: Bulk delete records
210+
const deleteResult = await BulkDeleteRecords({
211+
collectionName: "my_collection",
212+
recordIds: ["record1", "record2", "record3"]
213+
});
214+
```
215+
216+
### Improved Error Handling
217+
218+
The server now provides more detailed error messages with error codes to help diagnose issues more easily.
105219

106220
## Changelog
107221
All notable changes to this project will be documented in [this file](./CHANGELOG.md).

docs/improvement_sep2025.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Astra DB MCP Implementation Analysis and Recommendations
2+
3+
## Executive Summary
4+
5+
The Astra DB MCP Server provides a valuable integration between Large Language Models (LLMs) and Astra DB databases through the Model Context Protocol. After a thorough analysis of the implementation and comparison with the Astra DB API documentation, I've identified several opportunities for optimization and enhancement. The current implementation provides a solid foundation but could benefit from improvements in performance, functionality, security, and documentation.
6+
7+
## Key Findings
8+
9+
1. **Architecture and Implementation**: The server is well-structured with clear separation of concerns between server setup, database connection, tool implementation, and security features. It provides a comprehensive set of tools for basic database operations.
10+
11+
2. **API Coverage**: While the implementation covers most essential operations, there are gaps compared to the full Astra DB API, particularly in advanced vector capabilities, specialized query operations, and configuration options.
12+
13+
3. **Performance Considerations**: The current implementation of bulk operations uses multiple individual operations rather than native batch processing, which impacts performance. There are also opportunities for connection optimization and caching.
14+
15+
4. **Security and Error Handling**: The implementation includes good sanitization for prompt injection prevention but could benefit from enhanced error handling, structured error responses, and additional security features like rate limiting.
16+
17+
5. **Vector Database Support**: Basic vector collection creation is supported, but advanced vector search capabilities, hybrid search, and specialized vector configurations are not fully implemented.
18+
19+
6. **Documentation**: The current documentation provides basic setup instructions but lacks comprehensive tool reference, examples, tutorials, and architectural guidance.
20+
21+
## Detailed Recommendations
22+
23+
### 1. Performance Optimizations
24+
25+
#### Bulk Operations Enhancement
26+
```typescript
27+
// Current implementation
28+
const insertPromises = records.map((record) => collection.insertOne(record));
29+
const results = await Promise.all(insertPromises);
30+
31+
// Recommended enhancement
32+
const result = await collection.insertMany(records);
33+
```
34+
35+
**Benefits**: Significant performance improvement by reducing network requests and leveraging native batch processing.
36+
37+
#### Connection Optimization
38+
Implement connection pooling or reuse to avoid creating new database connections for each operation.
39+
40+
#### Caching Mechanism
41+
Add caching for frequently accessed data with TTL-based invalidation to reduce database load.
42+
43+
### 2. Vector Database Capabilities
44+
45+
#### Vector Search Implementation
46+
```typescript
47+
export async function VectorSearch({
48+
collectionName,
49+
queryVector,
50+
limit = 10,
51+
minScore = 0.0,
52+
filter = {},
53+
}) {
54+
const collection = db.collection(collectionName);
55+
56+
const results = await collection.find({
57+
$vector: {
58+
vector: queryVector,
59+
limit: limit,
60+
minScore: minScore,
61+
},
62+
...filter,
63+
}).toArray();
64+
65+
return sanitizeRecordData(results);
66+
}
67+
```
68+
69+
#### Hybrid Search Support
70+
Implement hybrid search combining vector similarity with traditional text search.
71+
72+
#### Enhanced Vector Configuration
73+
Add support for different similarity metrics, indexing algorithms, and configuration parameters.
74+
75+
### 3. Error Handling and Security
76+
77+
#### Structured Error Responses
78+
Implement consistent error response structure with error codes, messages, and troubleshooting information.
79+
80+
#### Rate Limiting
81+
Add rate limiting to prevent abuse and ensure system stability.
82+
83+
#### Audit Logging
84+
Implement audit logging for security-sensitive operations.
85+
86+
### 4. API Coverage Expansion
87+
88+
#### Find Distinct Values
89+
Implement the missing "Find distinct values" operation from the API.
90+
91+
#### Advanced Query Filters
92+
Enhance query capabilities with support for complex filters, nested fields, and range queries.
93+
94+
#### Cursor-based Pagination
95+
Implement cursor-based pagination for efficient handling of large result sets.
96+
97+
### 5. Documentation Improvements
98+
99+
#### Comprehensive Tool Reference
100+
Create detailed documentation for each tool with parameters, response formats, examples, and error scenarios.
101+
102+
#### Interactive Examples
103+
Develop workflow examples, code snippets, and integration examples with popular LLM platforms.
104+
105+
#### Architecture Documentation
106+
Create diagrams and explanations of the system architecture, data flow, and integration points.
107+
108+
## Prioritized List of Recommendations
109+
110+
### High Priority (Critical Improvements)
111+
112+
1. **Optimize Bulk Operations**
113+
- Replace individual operations with native batch processing
114+
- Implement chunking for large datasets
115+
- Add transaction support for atomic operations
116+
- Estimated impact: High (significant performance improvement)
117+
118+
2. **Enhance Vector Database Capabilities**
119+
- Implement vector search functionality
120+
- Add support for hybrid search
121+
- Support additional vector configuration options
122+
- Estimated impact: High (enables key AI use cases)
123+
124+
3. **Improve Error Handling**
125+
- Implement structured error responses
126+
- Add specific error types and codes
127+
- Enhance error recovery mechanisms
128+
- Estimated impact: High (improves reliability and user experience)
129+
130+
4. **Enhance Security Features**
131+
- Implement rate limiting
132+
- Add audit logging
133+
- Improve credential validation
134+
- Estimated impact: High (addresses security concerns)
135+
136+
### Medium Priority (Significant Enhancements)
137+
138+
5. **Expand API Coverage**
139+
- Implement missing API endpoints (find distinct values, replace document)
140+
- Add support for advanced query filters
141+
- Implement cursor-based pagination
142+
- Estimated impact: Medium (increases functionality)
143+
144+
6. **Improve Documentation**
145+
- Create comprehensive tool reference
146+
- Add examples and tutorials
147+
- Enhance code documentation
148+
- Estimated impact: Medium (improves developer experience)
149+
150+
7. **Implement Performance Optimizations**
151+
- Add caching mechanisms
152+
- Implement connection pooling
153+
- Add query optimization options
154+
- Estimated impact: Medium (improves performance)
155+
156+
8. **Enhance Collection Management**
157+
- Add support for indexing strategies
158+
- Implement field-specific indexing
159+
- Add collection metadata support
160+
- Estimated impact: Medium (improves flexibility)
161+
162+
### Lower Priority (Nice-to-Have Improvements)
163+
164+
9. **Improve User Experience**
165+
- Enhance response formats
166+
- Add contextual help
167+
- Implement better parameter validation
168+
- Estimated impact: Low to Medium (improves usability)
169+
170+
10. **Add Advanced Features**
171+
- Implement embedding generation integration
172+
- Add support for data masking
173+
- Implement asynchronous processing
174+
- Estimated impact: Low to Medium (adds specialized capabilities)
175+
176+
## Implementation Roadmap
177+
178+
### Phase 1: Foundation Improvements (1-3 months)
179+
- Optimize bulk operations
180+
- Enhance error handling
181+
- Improve basic security features
182+
- Update core documentation
183+
184+
### Phase 2: Feature Expansion (3-6 months)
185+
- Enhance vector database capabilities
186+
- Expand API coverage
187+
- Implement performance optimizations
188+
- Improve collection management
189+
190+
### Phase 3: Advanced Capabilities (6+ months)
191+
- Add advanced security features
192+
- Implement advanced vector search capabilities
193+
- Enhance user experience
194+
- Add specialized features for AI applications
195+
196+
## Conclusion
197+
198+
The Astra DB MCP Server provides a valuable integration between LLMs and Astra DB, but there are significant opportunities for enhancement. By implementing the recommended improvements, the server can offer better performance, more comprehensive functionality, enhanced security, and improved developer experience. The prioritized approach allows for incremental improvements while focusing first on the most critical aspects.

0 commit comments

Comments
 (0)