Introduction
When dealing with Virtru Audit client datasets, refining data based on time intervals significantly improves efficiency and accuracy. Whether you're collecting daily summaries, hourly logs, or even second-level events, customizing your time-based queries ensures that you fetch exactly what you need without unnecessary data overload.
In this article, we’ll explore:
- How to generate time-based intervals dynamically
- How data is structured by the day
- How to modify the current day’s starting time (e.g.,
01:00:00Z
) - How to fetch the current day's data first using reverse ordering
Time-Based Data Retrieval
The Virtru API allows users to query data between a start date and an end date. However, if the dataset spans a long period, making one massive request could:
- Overload the API
- Cause timeouts or slow response times
- Retrieve unnecessary data beyond what is needed
To optimize retrieval, we use time intervals to break down queries into smaller chunks, ensuring that each request focuses only on a specific timeframe.
Setting the Current Day and Timestamp as the End Date
To ensure that the script fetches data up to the present moment, we dynamically set end_date_str
to the current date and time in UTC.
- This ensures that the API fetches the most recent data
- Automatically updates every time the script runs
- Removes the need to manually change the end date
Generating Time Intervals Dynamically
The function below divides a given date range into smaller time intervals, making it easier to process data efficiently. Once data is fetched using the above method, it is typically stored in day-based folders for easy access.
Python Function - Generating Time-Based Intervals
from datetime import datetime, timezone, timedelta
def generate_date_intervals(start_date, end_date, delta):
"""
Generates time intervals between a start_date and end_date.
'delta' determines the size of each interval.
"""
current_date = start_date
while current_date < end_date:
interval_end = min(current_date + delta, end_date)
yield (current_date, interval_end)
current_date = interval_end
Fetching the Current Day’s Data First (Reverse Order)
By default, data is retrieved from the earliest date to the latest. However, if you need to prioritize the current day's data first, you can reverse the order using this function:
Fetching Data in Reverse Order
def generate_date_intervals(start_date, end_date, delta):
"""
Generates time intervals in reverse order (latest first).
"""
current_date = end_date
while current_date > start_date:
interval_start = max(current_date - delta, start_date)
yield (interval_start, current_date)
current_date = interval_start
This ensures that the most recent data is retrieved first, helping users analyze today's data before historical data.
Refining Data with Different Time Intervals
Depending on how granular you want the data, you can adjust the interval length.
Daily Intervals (Default)
interval_length = timedelta(days=1) # Fetches data in 1-day chunks
Best for: Large datasets that do not change frequently (e.g., transaction logs, audit logs).
Hourly Intervals
interval_length = timedelta(hours=1) # Fetches data in 1-hour chunks
Best for: Tracking hourly trends (e.g., server logs, usage analytics).
30-Minute Intervals
interval_length = timedelta(minutes=30) # Fetches data in 30-minute chunks
Best for: Real-time monitoring, such as tracking user activity in an app.
30-Second Intervals
interval_length = timedelta(seconds=30) # Fetches data in 30-second chunks
Best for: High-frequency data collection (e.g., IoT sensor data, stock trading analysis).
Start Time is in UTC
By default, all time intervals use UTC timestamps. This ensures:
- Consistency across global datasets
- Avoidance of timezone-related discrepancies
- Standardized timekeeping in API queries
(hour=01, minute=00, second=00, microsecond=0)
#YYYY-MM-DD
start_date_str = '2025-01-31T01:01:00:00Z #This can be changed to any starting date
Why Change the Start Time?
- If your data only starts generating at a later time, this avoids unnecessary requests.
- Helps focus on business hours instead of midnight resets.
How to Retrieve Data for a Few Days, Weeks, or Months
To retrieve historical data spanning multiple days, weeks, or even months, you simply need to adjust the start_date
and end_date
values.
You can manually set a fixed range (e.g., a few days, weeks, or months) by modifying start_date_str
and end_date_str
:
Fetching Data for 10 Days (Jan 1 - Jan 11, 2025
)
from datetime import datetime, timezone, timedelta
# Define a custom date range
start_date_str = '2025-01-01T01:00:00Z' # Start fetching from Jan 1, 2025
end_date_str = '2025-01-11T01:00:00Z' # Stop fetching at Jan 11, 2025
# Convert to datetime objects
start_date = datetime.fromisoformat(start_date_str.rstrip('Z'))
end_date = datetime.fromisoformat(end_date_str.rstrip('Z'))
print(f"Start Date: {start_date}")
print(f"End Date: {end_date}")
# Set the interval length
interval_length = timedelta(days=1) # Fetch data in 1-day chunks
This retrieves all data between 2025-01-01
and 2025-01-11
in daily intervals.
Fetch Data for Different Time Ranges
You can modify the date range and interval length based on your needs.
Fetch Data for a Few Days
start_date_str = '2025-01-01T01:00:00Z'
end_date_str = '2025-01-05T01:00:00Z' # 5-day range
interval_length = timedelta(days=1) # Daily data
Fetch Data for a Few Weeks
start_date_str = '2025-01-01T01:00:00Z'
end_date_str = '2025-01-21T01:00:00Z' # 3-week range
interval_length = timedelta(days=7) # Weekly data
Fetch Data for a Few Months
start_date_str = '2024-10-01T01:00:00Z'
end_date_str = '2025-01-01T01:00:00Z' # 3-month range
interval_length = timedelta(days=30) # Fetch data in 1-month chunks
Conclusion
Refining data retrieval by adjusting time intervals, start times, and retrieval order helps:
- Reduce unnecessary data processing
- Improve data organization
- Ensure efficient analysis and storage
- Optimize API queries
By using dynamic time-based queries, you can fetch exactly the data you need without overloading your system.