Refining Data Retrieval Based on Time Intervals – Virtru

Introduction

When dealing with Virtru Audit client datasets, refining data based on time intervals significantly improves efficiency and accuracy. Whether you're collecting daily summaries, hourly logs, or even second-level events, customizing your time-based queries ensures that you fetch exactly what you need without unnecessary data overload.

In this article, we’ll explore:

How to generate time-based intervals dynamically
How data is structured by the day
How to modify the current day’s starting time (e.g., 01:00:00Z)
How to fetch the current day's data first using reverse ordering

Time-Based Data Retrieval

The Virtru API allows users to query data between a start date and an end date. However, if the dataset spans a long period, making one massive request could:

Overload the API
Cause timeouts or slow response times
Retrieve unnecessary data beyond what is needed

To optimize retrieval, we use time intervals to break down queries into smaller chunks, ensuring that each request focuses only on a specific timeframe.

Setting the Current Day and Timestamp as the End Date

To ensure that the script fetches data up to the present moment, we dynamically set end_date_str to the current date and time in UTC.

from datetime import datetime, timezone

# Set the current date and time as the end date in UTC
end_date_str = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
end_date = datetime.fromisoformat(end_date_str.rstrip('Z'))

print(f"End Date (Current Timestamp in UTC): {end_date}")

This ensures that the API fetches the most recent data
Automatically updates every time the script runs
Removes the need to manually change the end date

Generating Time Intervals Dynamically

The function below divides a given date range into smaller time intervals, making it easier to process data efficiently. Once data is fetched using the above method, it is typically stored in day-based folders for easy access.

Python Function - Generating Time-Based Intervals

from datetime import datetime, timezone, timedelta

def generate_date_intervals(start_date, end_date, delta):
    """
    Generates time intervals between a start_date and end_date.
    'delta' determines the size of each interval.
    """
    current_date = start_date
    while current_date < end_date:
        interval_end = min(current_date + delta, end_date)
        yield (current_date, interval_end)
        current_date = interval_end

Fetching the Current Day’s Data First (Reverse Order)

By default, data is retrieved from the earliest date to the latest. However, if you need to prioritize the current day's data first, you can reverse the order using this function:

Fetching Data in Reverse Order

def generate_date_intervals(start_date, end_date, delta):
    """
    Generates time intervals in reverse order (latest first).
    """
    current_date = end_date
    while current_date > start_date:
        interval_start = max(current_date - delta, start_date)
        yield (interval_start, current_date)
        current_date = interval_start

This ensures that the most recent data is retrieved first, helping users analyze today's data before historical data.

Refining Data with Different Time Intervals

Depending on how granular you want the data, you can adjust the interval length.

Daily Intervals (Default)

interval_length = timedelta(days=1)  # Fetches data in 1-day chunks

Best for: Large datasets that do not change frequently (e.g., transaction logs, audit logs).

Hourly Intervals

interval_length = timedelta(hours=1)  # Fetches data in 1-hour chunks

Best for: Tracking hourly trends (e.g., server logs, usage analytics).

30-Minute Intervals

interval_length = timedelta(minutes=30)  # Fetches data in 30-minute chunks

Best for: Real-time monitoring, such as tracking user activity in an app.

30-Second Intervals

interval_length = timedelta(seconds=30)  # Fetches data in 30-second chunks

Best for: High-frequency data collection (e.g., IoT sensor data, stock trading analysis).

Start Time is in UTC

By default, all time intervals use UTC timestamps. This ensures:

Consistency across global datasets
Avoidance of timezone-related discrepancies
Standardized timekeeping in API queries

(hour=01, minute=00, second=00, microsecond=0)

#YYYY-MM-DD
start_date_str = '2025-01-31T01:01:00:00Z #This can be changed to any starting date

Why Change the Start Time?

If your data only starts generating at a later time, this avoids unnecessary requests.
Helps focus on business hours instead of midnight resets.

How to Retrieve Data for a Few Days, Weeks, or Months

To retrieve historical data spanning multiple days, weeks, or even months, you simply need to adjust the start_date and end_date values.

You can manually set a fixed range (e.g., a few days, weeks, or months) by modifying start_date_str and end_date_str:

Fetching Data for 10 Days (`Jan 1 - Jan 11, 2025`)

from datetime import datetime, timezone, timedelta

# Define a custom date range
start_date_str = '2025-01-01T01:00:00Z'  # Start fetching from Jan 1, 2025
end_date_str = '2025-01-11T01:00:00Z'    # Stop fetching at Jan 11, 2025

# Convert to datetime objects
start_date = datetime.fromisoformat(start_date_str.rstrip('Z'))
end_date = datetime.fromisoformat(end_date_str.rstrip('Z'))

print(f"Start Date: {start_date}")
print(f"End Date: {end_date}")

# Set the interval length
interval_length = timedelta(days=1)  # Fetch data in 1-day chunks

This retrieves all data between 2025-01-01 and 2025-01-11 in daily intervals.

Fetch Data for Different Time Ranges

You can modify the date range and interval length based on your needs.

Fetch Data for a Few Days

start_date_str = '2025-01-01T01:00:00Z'  
end_date_str = '2025-01-05T01:00:00Z'  # 5-day range

interval_length = timedelta(days=1)  # Daily data

Fetch Data for a Few Weeks

start_date_str = '2025-01-01T01:00:00Z'
end_date_str = '2025-01-21T01:00:00Z'  # 3-week range

interval_length = timedelta(days=7)  # Weekly data

Fetch Data for a Few Months

start_date_str = '2024-10-01T01:00:00Z'
end_date_str = '2025-01-01T01:00:00Z'  # 3-month range

interval_length = timedelta(days=30)  # Fetch data in 1-month chunks

Conclusion

Refining data retrieval by adjusting time intervals, start times, and retrieval order helps:

Reduce unnecessary data processing
Improve data organization
Ensure efficient analysis and storage
Optimize API queries

By using dynamic time-based queries, you can fetch exactly the data you need without overloading your system.