Skip to content

Commit 43bda24

Browse files
authored
Merge pull request #187 from MetaCell/feature/cache_mech
Caching mechanism
2 parents 8deb00a + b607965 commit 43bda24

32 files changed

Lines changed: 1085 additions & 201 deletions

applications/virtual-fly-brain/Dockerfile

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@ RUN apt-get update && apt-get install -y \
2020
libosmesa6 \
2121
freeglut3-dev \
2222
git \
23+
cron \
24+
x11-utils \
25+
mesa-utils \
26+
libx11-dev \
27+
libxext-dev \
2328
&& rm -rf /var/lib/apt/lists/*
2429

2530
RUN apt-get remove -y nodejs npm || true
@@ -33,6 +38,8 @@ RUN npm install -g yarn && \
3338

3439
# Set up virtual display for OpenGL
3540
ENV DISPLAY=:99
41+
ENV LIBGL_ALWAYS_INDIRECT=1
42+
ENV MESA_GL_VERSION_OVERRIDE=3.3
3643

3744
COPY frontend ${BUILDDIR}
3845
WORKDIR ${BUILDDIR}
@@ -48,7 +55,21 @@ COPY backend/ /usr/src/app
4855
RUN pip3 install -e .
4956

5057
RUN cp -r ${BUILDDIR}/dist/ /usr/src/app/virtual_fly_brain/www
58+
59+
# Set up cache cleanup cron job
60+
COPY backend/vfb_cache_cleanup.sh /usr/local/bin/vfb_cache_cleanup.sh
61+
RUN chmod +x /usr/local/bin/vfb_cache_cleanup.sh
62+
63+
# Copy and set up entrypoint script
64+
COPY backend/docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
65+
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
66+
67+
# Create cron job to run cache cleanup daily at 2 AM
68+
RUN echo "0 2 * * * /usr/local/bin/vfb_cache_cleanup.sh" | crontab -
69+
70+
# Create log directory for cache cleanup logs
71+
RUN mkdir -p /var/log && touch /var/log/vfb_cache_cleanup.log
5172
WORKDIR /usr/src/app/virtual_fly_brain
5273

5374
EXPOSE 8080
54-
ENTRYPOINT ["gunicorn", "--log-level=info", "--preload", "--bind=0.0.0.0:8080", "virtual_fly_brain.__main__:app"]
75+
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh", "gunicorn", "--log-level=info", "--preload", "--bind=0.0.0.0:8080", "--timeout=120", "--graceful-timeout=30", "--keep-alive=5", "virtual_fly_brain.__main__:app"]
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# API Request Logging
2+
3+
This document describes the comprehensive logging system implemented for the Virtual Fly Brain REST API.
4+
5+
## Overview
6+
7+
The Flask application now includes detailed logging for all REST endpoint requests. This logging is designed to be visible in Kubernetes pod logs and provides comprehensive tracking of API usage.
8+
9+
## Features
10+
11+
### Request Tracking
12+
- **Unique Request IDs**: Each request gets a unique identifier combining timestamp and microsecond precision
13+
- **Complete Request Data**: Logs include all input parameters, headers, and metadata
14+
- **Response Tracking**: Success/error status and response times are logged
15+
- **Error Details**: Detailed error information including exception type and message
16+
17+
### Log Format
18+
All logs use structured JSON format for easy parsing and analysis:
19+
20+
```json
21+
{
22+
"request_id": "2024-01-15T10:30:45.123456_567890",
23+
"endpoint": "term_info",
24+
"method": "GET",
25+
"url": "http://localhost:8080/get_term_info?id=FBbt_00003624",
26+
"path": "/get_term_info",
27+
"remote_addr": "192.168.1.100",
28+
"user_agent": "Mozilla/5.0...",
29+
"args": {"id": "FBbt_00003624"},
30+
"form_data": {},
31+
"json_data": null,
32+
"content_length": null
33+
}
34+
```
35+
36+
## Logged Endpoints
37+
38+
The following REST endpoints are instrumented with logging:
39+
40+
- `GET /` - Static file serving (index.html)
41+
- `GET /get_instances` - Get instances by short_form parameter
42+
- `GET /get_term_info` - Get term information by id parameter
43+
- `GET /run_query` - Run queries by id and query_type parameters
44+
45+
## Log Types
46+
47+
### REQUEST_START
48+
Logged when a request begins processing:
49+
- Request metadata (URL, method, headers)
50+
- Input parameters (query args, form data, JSON body)
51+
- Client information (IP, User-Agent)
52+
53+
### REQUEST_END
54+
Logged when a request completes successfully:
55+
- Request ID (matches REQUEST_START)
56+
- Processing duration in milliseconds
57+
- Response status code
58+
59+
### REQUEST_ERROR
60+
Logged when a request fails:
61+
- Request ID (matches REQUEST_START)
62+
- Processing duration until error
63+
- Exception type and error message
64+
65+
## Example Log Output
66+
67+
```
68+
2024-01-15 10:30:45,123 - vfb_api - INFO - REQUEST_START: {"request_id": "2024-01-15T10:30:45.123456_567890", "endpoint": "term_info", "method": "GET", "url": "http://localhost:8080/get_term_info?id=FBbt_00003624", "path": "/get_term_info", "remote_addr": "192.168.1.100", "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36", "args": {"id": "FBbt_00003624"}, "form_data": {}, "json_data": null, "content_length": null}
69+
70+
2024-01-15 10:30:45,456 - vfb_api - INFO - REQUEST_END: {"request_id": "2024-01-15T10:30:45.123456_567890", "status": "SUCCESS", "duration_ms": 333.45, "response_status": 200}
71+
```
72+
73+
## Kubernetes Integration
74+
75+
The logging is configured to write to stdout, which makes it automatically available through:
76+
77+
```bash
78+
# View real-time logs
79+
kubectl logs -f <pod-name>
80+
81+
# View logs with timestamps
82+
kubectl logs --timestamps <pod-name>
83+
84+
# View recent logs
85+
kubectl logs --tail=100 <pod-name>
86+
87+
# Follow logs for a specific container
88+
kubectl logs -f <pod-name> -c <container-name>
89+
```
90+
91+
## Log Analysis
92+
93+
The structured JSON format allows for easy log analysis:
94+
95+
```bash
96+
# Filter for specific endpoints
97+
kubectl logs <pod-name> | grep "get_term_info"
98+
99+
# Extract request IDs for debugging
100+
kubectl logs <pod-name> | grep "REQUEST_START" | jq -r '.request_id'
101+
102+
# Monitor response times
103+
kubectl logs <pod-name> | grep "REQUEST_END" | jq '.duration_ms'
104+
105+
# Track errors
106+
kubectl logs <pod-name> | grep "REQUEST_ERROR"
107+
```
108+
109+
## Configuration
110+
111+
The logging system is configured in `__main__.py`:
112+
113+
- **Log Level**: INFO (captures all request information)
114+
- **Format**: Timestamped with logger name and level
115+
- **Handler**: StreamHandler (outputs to stdout)
116+
- **Logger Name**: `vfb_api`
117+
118+
## Implementation Details
119+
120+
The logging is implemented using a Python decorator `@log_request` that:
121+
122+
1. Captures request start time and generates unique ID
123+
2. Extracts all request data (URL, parameters, headers)
124+
3. Logs REQUEST_START with complete request information
125+
4. Executes the endpoint function
126+
5. Logs REQUEST_END with timing and status information
127+
6. Handles exceptions and logs REQUEST_ERROR with error details
128+
129+
## Benefits
130+
131+
- **Production Monitoring**: Track API usage patterns and performance
132+
- **Debugging**: Trace specific requests through the system
133+
- **Performance Analysis**: Monitor response times and identify bottlenecks
134+
- **Error Tracking**: Capture and analyze application errors
135+
- **Compliance**: Maintain audit trail of API access
136+
- **Capacity Planning**: Understand load patterns and peak usage
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/bash
2+
3+
# Start cron service in the background
4+
service cron start
5+
6+
# Create cache directories with proper permissions
7+
mkdir -p /tmp/vfb_cache/term_info /tmp/vfb_cache/queries
8+
chmod 755 /tmp/vfb_cache /tmp/vfb_cache/term_info /tmp/vfb_cache/queries
9+
10+
# Start the main application
11+
exec "$@"

applications/virtual-fly-brain/backend/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@ dataclasses_json >= 0.5.7
1616
dacite >= 1.6.0
1717
pandas >= 1.0.0
1818
numpy >= 1.19.0
19-
git+https://github.com/VirtualFlyBrain/VFBquery.git
19+
vfbquery==0.3.4
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
# VFB Cache Cleanup Job
3+
# This script cleans up expired cache files and can be run as a cron job
4+
5+
# Set working directory to the application root
6+
cd /usr/src/app
7+
8+
# Run cache cleanup using the installed Python environment
9+
python3 -m virtual_fly_brain.services.cache_manager cleanup
10+
11+
# Log the cleanup with timestamp
12+
echo "$(date): VFB cache cleanup completed" >> /var/log/vfb_cache_cleanup.log

applications/virtual-fly-brain/backend/virtual_fly_brain/__main__.py

Lines changed: 83 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44
import werkzeug
55
import numpy as np
66
import pandas as pd
7+
import logging
8+
import time
9+
from datetime import datetime
10+
from virtual_fly_brain.services.numpy_encoder import NumpyEncoder
711
vfb = None
812
try:
913
import vfbquery as vfb
@@ -13,41 +17,91 @@
1317
from virtual_fly_brain.services.queries import run_query
1418
from virtual_fly_brain.services.term_info import get_term_info
1519

16-
class NumpyEncoder(json.JSONEncoder):
17-
""" Custom encoder for numpy data types """
18-
def default(self, obj):
19-
if isinstance(obj, (np.int_, np.intc, np.intp, np.int8,
20-
np.int16, np.int32, np.int64, np.uint8,
21-
np.uint16, np.uint32, np.uint64)):
2220

23-
return int(obj)
24-
25-
elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)):
26-
return float(obj)
27-
28-
elif isinstance(obj, (np.complex_, np.complex64, np.complex128)):
29-
return {'real': obj.real, 'imag': obj.imag}
30-
31-
elif isinstance(obj, (np.ndarray,)):
32-
return obj.tolist()
33-
34-
elif isinstance(obj, (np.bool_)):
35-
return bool(obj)
36-
37-
elif isinstance(obj, (np.void)):
38-
return None
39-
40-
return json.JSONEncoder.default(self, obj)
21+
# Configure logging
22+
logging.basicConfig(
23+
level=logging.INFO,
24+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
25+
handlers=[
26+
logging.StreamHandler() # This ensures logs go to stdout for pod logs
27+
]
28+
)
29+
30+
# Create logger for API requests
31+
api_logger = logging.getLogger('vfb_api')
32+
33+
34+
def log_request(func):
35+
"""Decorator to log REST endpoint requests with input data"""
36+
def wrapper(*args, **kwargs):
37+
start_time = time.time()
38+
request_id = f"{datetime.now().isoformat()}_{int(time.time() * 1000000) % 1000000}"
39+
40+
# Log request start
41+
request_data = {
42+
'request_id': request_id,
43+
'endpoint': flask.request.endpoint,
44+
'method': flask.request.method,
45+
'url': flask.request.url,
46+
'path': flask.request.path,
47+
'remote_addr': flask.request.remote_addr,
48+
'user_agent': flask.request.headers.get('User-Agent', ''),
49+
'args': dict(flask.request.args),
50+
'form_data': dict(flask.request.form) if flask.request.form else {},
51+
'json_data': flask.request.get_json(silent=True) if flask.request.is_json else None,
52+
'content_length': flask.request.content_length
53+
}
54+
55+
api_logger.info(f"REQUEST_START: {json.dumps(request_data, cls=NumpyEncoder)}")
56+
57+
try:
58+
# Execute the actual endpoint function
59+
result = func(*args, **kwargs)
60+
61+
# Log successful response
62+
end_time = time.time()
63+
duration_ms = round((end_time - start_time) * 1000, 2)
64+
65+
response_data = {
66+
'request_id': request_id,
67+
'status': 'SUCCESS',
68+
'duration_ms': duration_ms,
69+
'response_status': getattr(result, 'status_code', 200) if hasattr(result, 'status_code') else 200
70+
}
71+
72+
api_logger.info(f"REQUEST_END: {json.dumps(response_data, cls=NumpyEncoder)}")
73+
return result
74+
75+
except Exception as e:
76+
# Log error response
77+
end_time = time.time()
78+
duration_ms = round((end_time - start_time) * 1000, 2)
79+
80+
error_data = {
81+
'request_id': request_id,
82+
'status': 'ERROR',
83+
'duration_ms': duration_ms,
84+
'error_type': type(e).__name__,
85+
'error_message': str(e)
86+
}
87+
88+
api_logger.error(f"REQUEST_ERROR: {json.dumps(error_data, cls=NumpyEncoder)}")
89+
raise
90+
91+
wrapper.__name__ = func.__name__
92+
return wrapper
4193

4294

4395
def init_webapp_routes(app):
4496
@app.route('/', methods=['GET'])
97+
# @log_request
4598
def index():
4699
return flask.send_from_directory("www", 'index.html')
47100

48101

49102
@app.route('/get_instances', methods=['GET'])
50103
@cross_origin(supports_credentials=True)
104+
# @log_request
51105
def instances():
52106
short_form = flask.request.args.get('short_form')
53107
if not short_form:
@@ -66,6 +120,7 @@ def instances():
66120

67121
@app.route('/get_term_info', methods=['GET'])
68122
@cross_origin(supports_credentials=True)
123+
# @log_request
69124
def term_info():
70125
term_id = flask.request.args.get('id')
71126
if not term_id:
@@ -82,6 +137,7 @@ def term_info():
82137

83138
@app.route('/run_query', methods=['GET'])
84139
@cross_origin(supports_credentials=True)
140+
# @log_request
85141
def get_query_results():
86142
term_id = flask.request.args.get('id')
87143
query_type = flask.request.args.get('query_type')
@@ -120,10 +176,11 @@ def handle_bad_request(e):
120176
)
121177
CORS(app, support_credentials=True)
122178
init_webapp_routes(app)
123-
# TODO: fix this to use the init_flask function from cloudharness
124-
# app = init_flask(title="Virtual Fly Brain REST API", webapp=True, init_app_fn=init_webapp_routes)
179+
125180

126181
def main():
182+
api_logger.info("Starting Virtual Fly Brain REST API server on host='0.0.0.0', port=8080")
183+
api_logger.info("Logging is configured for API request tracking")
127184
app.run(host='0.0.0.0', port=8080)
128185

129186
if __name__ == '__main__':

0 commit comments

Comments
 (0)