@ -8,6 +8,7 @@ All data collected by the application needs to be exported to the **Consolidated
This is done via XML exports saved in an **S3 bucket**.
Currently, we export the following:
- **Lettings logs**
- **Users**
- **Organisations**
@ -54,9 +55,11 @@ The **master manifest** is a CSV file that lists all the collections generated d
The **collection archive** contains all files for a single collection (e.g., `2024 lettings logs` or `users`). This file is referenced in the master manifest. Each exported collection has its own archive.
- `pt001`: Each file contains up to `MAX_XML_RECORDS` (10,000) records. If more records are exported, they are split into multiple files with incremented part numbers.
---
@ -91,15 +95,19 @@ The structure is the same, except the start and end years are hardcoded since th
## Navigating the Export Code
### `export_service.rb`
Orchestrates all exports and generates the master manifest. Check this file when adding new collections to the daily export or modifying the master manifest.
### `xml_export_service.rb`
Creates Export objects and writes them to S3. Use this file to see how Export objects are created, how export increment numbers are set, and how export records are batched, archived, and written.
### `{collection}_export_service.rb`
Individual collection export service files (e.g., `lettings_log_export_service.rb`) construct the data export XML content. Modify these files to add new data to an existing collection or change the format of existing fields.
### `{collection}_export_constants.rb`
These collection-specific files define the `EXPORT_FIELDS` constants. A field will not be exported unless added to this constant.
When adding new fields to year-specific exports, it is often necessary to include them starting from a specific year (typically the most recent). Constants like `POST_2024_EXPORT_FIELDS` are used for this purpose.
@ -111,12 +119,14 @@ When adding new fields to year-specific exports, it is often necessary to includ
Partial exports run daily, triggered via a cron job. These include all records updated since the last export.
To determine updated records, the service uses the `updated_at` and `values_updated_at` columns:
- **`updated_at`**: Updated whenever a record is edited through the service.
- **`values_updated_at`**: Used in rare cases when records are manually updated in bulk, and `updated_at` is not set. Not all collections include this field.
### Triggering a Partial Export
The easiest way to trigger a partial export is through the **Sidekiq** console:
1. Log in as a support user and navigate to `/sidekiq` in the service URL.
2. Go to the **Cron** tab (last tab in the top navigation).
3. Find the `data_export_xml` job (the only one listed) and click **Enqueue Now**.
@ -128,6 +138,7 @@ The easiest way to trigger a partial export is through the **Sidekiq** console:
A full re-export of an entire collection may be required if new fields are added or existing fields are re-coded.
Full exports can only be run via a **rake task**.
<!-- Update this section when sales exports are added, as they will affect rake tasks -->
If the collection size is very large, full exports may fail due to memory issues. In such cases, it is better to batch exports into chunks of ~60,000 records and run several partial exports over multiple days. The `values_updated_at` field can help with this.