Browse Source
* Modularise documentation * Add some background about the service * Add more instructions for local dependencies * Form builder docs * Stimulus and asset pipeline sections * Infrastructure setup * Add monitoring and logging * Init form runner * Export init * Testing * Testing * Update architecture image * Domain docs * Org relationshipspull/698/head
baarkerlounger
3 years ago
committed by
GitHub
17 changed files with 516 additions and 248 deletions
@ -0,0 +1,142 @@
|
||||
# **Developing locally on host machine** |
||||
|
||||
The most common way to run a development version of the application is run with local dependencies. |
||||
|
||||
Dependencies: |
||||
|
||||
- Ruby |
||||
- Rails |
||||
- PostgreSQL |
||||
- NodeJS |
||||
- Gecko driver (https://github.com/mozilla/geckodriver/releases) [for running Selenium tests] |
||||
|
||||
We recommend using RBenv to manage Ruby versions. |
||||
|
||||
1. Install PostgreSQL |
||||
|
||||
Mac OS: |
||||
```bash |
||||
brew install postgresql |
||||
brew services start postgresql |
||||
``` |
||||
|
||||
Linux (Debian): |
||||
```bash |
||||
sudo apt install -y postgresql postgresql-contrib libpq-dev |
||||
sudo systemctl start postgresql |
||||
``` |
||||
|
||||
2. Create a Postgres user |
||||
```bash |
||||
sudo su - postgres -c "createuser <username> -P" |
||||
``` |
||||
|
||||
3. Install RBenv & Ruby-build |
||||
|
||||
Mac OS: |
||||
```bash |
||||
brew install rbenv |
||||
rbenv init |
||||
mkdir -p ~/.rbenv/plugins |
||||
git clone https://github.com/rbenv/ruby-build.git ~/.rbenv/plugins/ruby-build |
||||
``` |
||||
|
||||
Linux (Debian): |
||||
```bash |
||||
sudo apt install -y rbenv |
||||
echo 'export PATH="/usr/local/rbenv/bin:\$PATH"' >> ~/.bashrc |
||||
rbenv init |
||||
echo "# Load RBenv" >> ~/.bashrc |
||||
echo 'eval "$(rbenv init -)"' >> ~/.bashrc |
||||
mkdir -p ~/.rbenv/plugins |
||||
git clone https://github.com/rbenv/ruby-build.git ~/.rbenv/plugins/ruby-build |
||||
``` |
||||
|
||||
4. Install Ruby & Bundler |
||||
|
||||
```bash |
||||
rbenv install 3.1.2 |
||||
rbenv global 3.1.2 |
||||
gem install bundler |
||||
``` |
||||
|
||||
5. Install Javascript depenencies |
||||
|
||||
Mac OS: |
||||
```bash |
||||
brew install node |
||||
brew install yarn |
||||
``` |
||||
|
||||
Linux (Debian): |
||||
```bash |
||||
curl -sL https://deb.nodesource.com/setup_16.x | bash - |
||||
sudo apt -y install nodejs |
||||
mkdir "~/.npm-packages" |
||||
npm config set prefix "~/.npm-packages" |
||||
echo 'NPM_PACKAGES="~/.npm-packages"' >> ~/.bashrc |
||||
echo 'export PATH="$PATH:$NPM_PACKAGES/bin"' >> ~/.bashrc |
||||
npm install --global yarn |
||||
``` |
||||
|
||||
6. Clone the repo |
||||
```bash |
||||
git clone git@github.com:communitiesuk/submit-social-housing-lettings-and-sales-data.git |
||||
``` |
||||
|
||||
|
||||
## App setup (OS agnostic) |
||||
|
||||
1. Copy the `.env.example` to `.env` and replace the database credentials with your local postgres user credentials. |
||||
|
||||
2. Install the dependencies:\ |
||||
`bundle install && yarn install` |
||||
|
||||
3. Create the database & run migrations:\ |
||||
`rake db:create db:migrate` |
||||
|
||||
4. Seed the database if required:\ |
||||
`rake db:seed` |
||||
|
||||
5. Start the dev servers |
||||
|
||||
a. Using foreman:\ |
||||
`./bin/dev` |
||||
|
||||
b. Individually:\ |
||||
|
||||
i. Rails:\ |
||||
`bundle exec rails s` |
||||
|
||||
ii. JS (for hot reloading):\ |
||||
`yarn build --mode=development --watch` |
||||
|
||||
If you're not modifying front end assets you can bundle them as a one off task:\ |
||||
`yarn build --mode=development` |
||||
|
||||
Development mode will target the latest versions of Chrome, Firefox and Safari for transpilation while production mode will target older browsers. |
||||
|
||||
The Rails server will start on <http://localhost:3000>. |
||||
|
||||
Running the test suite (front end assets need to be built or server needs to be running):\ |
||||
`bundle exec rspec` |
||||
|
||||
|
||||
# **Using Docker** |
||||
|
||||
|
||||
1. Build the image:\ |
||||
`docker-compose build` |
||||
|
||||
2. Run the database migrations:\ |
||||
`docker-compose run --rm app /bin/bash -c 'rake db:migrate'` |
||||
|
||||
3. Seed the database if required:\ |
||||
`docker-compose run --rm app /bin/bash -c 'rake db:seed'` |
||||
|
||||
4. To be able to debug with Pry run the app using:\ |
||||
`docker-compose run --service-ports app` |
||||
|
||||
If this is not needed you can run `docker-compose up` as normal |
||||
|
||||
The Rails server will start on <http://localhost:8080>. |
@ -0,0 +1,15 @@
|
||||
# CDS exports |
||||
|
||||
All data collected by the application needs to be exported to the Consolidated Data Store (CDS) which is a data warehouse based on MS SQL running in the DAP (Data Analytics Platform). |
||||
|
||||
This is done via XML exports saved in an S3 bucket located in the DAP VPC using dedicated credentials shared out of band. The data mapping for this export can be found in `app/services/exports/case_log_export_service.rb`. Initially the application database field names and field types were chosen to match the existing CDS data as closely as possible to minimise the amount of transformation needed. This has led to a less than optimal data model though and increasingly we should look to transform at the mapping layer where beneficial for our application. |
||||
|
||||
The export service is triggered nightly using [Gov PaaS tasks](https://docs.cloudfoundry.org/devguide/using-tasks.html). These tasks are triggered from a Github action, as Gov PaaS does not currently support the Cloud Foundry Task Scheduler. |
||||
|
||||
The S3 bucket is located in the DAP VPC rather than the application VPC as DAP runs in an AWS account directly so access to the S3 bucket can be restricted to only the IPs used by the application. This is not possible the other way around as Gov PaaS does not support restricting S3 access by IP (https://github.com/alphagov/paas-roadmap/issues/107). |
||||
|
||||
## Other options previously considered: |
||||
|
||||
- CDC replication using a managed service such as [AWS DMS](https://aws.amazon.com/dms/) |
||||
- Would require VPC peering which Gov PaaS does not currently support (https://github.com/alphagov/paas-roadmap/issues/105) |
||||
- Would require CDS to make changes to their ingestion model |
@ -0,0 +1,151 @@
|
||||
## Single log submission form configuration |
||||
|
||||
### Background |
||||
|
||||
Lettings and Sales of Social housing data is collected in annual "collection windows" that run from 1st April to 1st April. During this window the form and questions generally stay constant. The form will generally change by small amounts between each collection window. Typical changes are adding new questions, adding or removing answer options from questions or tweaking question wording for clarity. |
||||
|
||||
A paper form is produced for guidance and to help data providers collect the data offline, and a bulk upload template is circulated which need to match the online form. |
||||
|
||||
Data is accepted for a collection window for up to 3 months after it's finished to allow for late data submission. This means that between April and July two version of the form run simultaneously. |
||||
|
||||
Other considerations that went into our design are being able to re-use as much of this solution for other data collections, and possibly having the ability to generate the form and/or form changes from a UI. |
||||
|
||||
We haven't used micro-services, preferring to deploy a single application for CLDC but we have modelled the form itself as configuration in the form of a JSON structure that acts as a sort of DSL/form builder for the form. The idea is to decouple the code that creates the required routes, controller methods, views etc to display the form from the actual wording of questions or order of pages such that it becomes possible to make changes to the form with little or no code changes. |
||||
|
||||
This should also mean that in the future it could be possible to create a UI that can construct the JSON config, which would open up the ability to make form changes to a wider audience. Doing this fully would require generating and running the necessary migrations for data storage, generating the required ActiveRecord methods to validate the data server side, and generating/updating API endpoints and documentation. All of this is likely to be beyond the scope of initial MVP but could be looked at in the future. |
||||
|
||||
Since initially the JSON config will not create database migrations or ActiveRecord model validations, it will instead assume that these have been correctly created for the config provided. The reasoning for this is the following assumptions: |
||||
|
||||
- The form will be tweaked regularly (amending questions wording, changing the order of questions or the page a question is displayed on) |
||||
- The actual data collected will change very infrequently. Time series continuity is very important to ADD (Analysis and Data Directorate) so the actual data collected should stay largely consistent i.e. in general we can change the question wording in ways that makes the intent clearer or easier to understand, but not in ways that would make the data provider give a different answer. |
||||
|
||||
A form parser class will parse this config into ruby objects/methods that can be used as an API by the rest of the application, such that we could change the underlying config if needed (for example swap JSON for YAML or for DataBase objects) without needing to change the rest of the application. We'll call this the "Form Runner" part of the application. |
||||
|
||||
### Setup this log |
||||
|
||||
The setup this log section is treated slightly differently from the rest of the form. It is more accurately viewed as providing metadata about the form than as being part of the form itself. It also needs to know far more about the application specific context than other parts of the form such as who the current user is, what organisation they're part of and what role they have etc. |
||||
|
||||
As a result it's not modelled as part of the config but rather as code. It still uses the same "Form Runner" components though. |
||||
|
||||
### Features the Form Config supports |
||||
|
||||
- Defining sections, subsections, pages and questions that fit the GovUK tasklist pattern |
||||
- Auto-generated routes - urls are automatically created from dasherized page names |
||||
- Data persistence requires a database field to exist which matches the name/id for each question (and answer option for checkbox questions) |
||||
- Text, Numeric, Date, Radio, Select and Checkbox question types |
||||
- Conditional questions (`conditional_for`) - Radio and Checkbox questions can support "conditional" text or numeric questions that show/hide on the same page when the triggering option is selected |
||||
- Routing (`depends_on`) - all pages can specify conditions (attributes of the case log) that determine whether or not they're shown to the user |
||||
- Methods can be chained (i.e. you can have conditions in the form `{ owning_organisation.provider_type: "local_authority"`) which will call `case_log.owning_organisation.provider_type` and compare the result to the provided value. |
||||
- Numeric questions support math expression depends_on conditions such as `{ age2: ">16" }` |
||||
- By default questions on pages that are not routed to are assumed to be invalid and are cleared. This can be prevented by setting `derived: true` on a question. |
||||
- Questions can be optionally hidden from the check answers page of each section by setting `hidden_in_check_answers: true`. This can also take a condition. |
||||
- Questions can be set as being inferred from other answers. This is similar to derived with the difference being that derived questions can be derived from anything not just other form question answers, and inferred answers are cleared when the answers they depend on change, whereas derived questions aren't. |
||||
- Soft validation interruption pages can be included |
||||
- For complex html guidance partials can be referenced |
||||
|
||||
### JSON Config |
||||
|
||||
The form for this is driven by a JSON file in `/config/forms/{start_year}_{end_year}.json` |
||||
|
||||
The JSON should follow the structure: |
||||
|
||||
```jsonc |
||||
{ |
||||
"form_type": "lettings" / "sales", |
||||
"start_year": Integer, // i.e. 2020 |
||||
"end_year": Integer, // i.e. 2021 |
||||
"sections": { |
||||
"[snake_case_section_name_string]": { |
||||
"label": String, |
||||
"description": String, |
||||
"subsections": { |
||||
"[snake_case_subsection_name_string]": { |
||||
"label": String, |
||||
"pages": { |
||||
"[snake_case_page_name_string]": { |
||||
"header": String, |
||||
"description": String, |
||||
"questions": { |
||||
"[snake_case_question_name_string]": { |
||||
"header": String, |
||||
"hint_text": String, |
||||
"check_answer_label": String, |
||||
"type": "text" / "numeric" / "radio" / "checkbox" / "date", |
||||
"min": Integer, // numeric only |
||||
"max": Integer, // numeric only |
||||
"step": Integer, // numeric only |
||||
"width": 2 / 3 / 4 / 5 / 10 / 20, // text and numeric only |
||||
"prefix": String, // numeric only |
||||
"suffix": String, //numeric only |
||||
"answer_options": { // checkbox and radio only |
||||
"0": String, |
||||
"1": String |
||||
}, |
||||
"conditional_for": { |
||||
"[snake_case_question_to_enable_1_name_string]": ["condition-that-enables"], |
||||
"[snake_case_question_to_enable_2_name_string]": ["condition-that-enables"] |
||||
}, |
||||
"inferred_answers": { "field_that_gets_inferred_from_current_field": { "is_that_field_inferred": true } }, |
||||
"inferred_check_answers_value": { |
||||
"condition": { "field_name_for_inferred_check_answers_condition": "field_value_for_inferred_check_answers_condition" }, |
||||
"value": "Inferred value that gets displayed if condition is met" |
||||
} |
||||
} |
||||
}, |
||||
"depends_on": [{ "question_key": "answer_value_required_for_this_page_to_be_shown" }] |
||||
} |
||||
} |
||||
} |
||||
} |
||||
} |
||||
} |
||||
} |
||||
``` |
||||
|
||||
Assumptions made by the format: |
||||
|
||||
- All forms have at least 1 section |
||||
- All sections have at least 1 subsection |
||||
- All subsections have at least 1 page |
||||
- All pages have at least 1 question |
||||
- The ActiveRecord case log model has a field for each question name (must match). In the case of checkbox questions it must have one field for every answer option (again names must match). |
||||
- Text not required by a page/question such as a header or hint text should be passed as an empty string |
||||
- For conditionally shown questions, conditions that have been implemented and can be used are: |
||||
- Radio question answer option selected matches one of conditional e.g. ["answer-options-1-string", "answer-option-3-string"] |
||||
- Numeric question value matches condition e.g. [">2"], ["<7"] or ["== 6"] |
||||
- When the top level question is a radio button and the conditional question is a numeric, text or date field then the conditional question is shown inline |
||||
- When the conditional question is a radio, checkbox or select field it should be displayed on it's own page and "depends_on" should be used rather than "conditional_for" |
||||
|
||||
Page routing: |
||||
|
||||
- Form navigation works by stepping sequentially through every page defined in the JSON form definition for the given subsection. For every page it checks if it has "depends_on" conditions. If it does, it evaluates them to determine whether that page should be show or not. |
||||
|
||||
- In this way we can build up whole branches by having: |
||||
|
||||
```jsonc |
||||
"page_1": { "questions": { "question_1: "answer_options": ["A", "B"] } }, |
||||
"page_2": { "questions": { "question_2: "answer_options": ["C", "D"] }, "depends_on": [{ "question_1": "A" }] }, |
||||
"page_3": { "questions": { "question_3: "answer_options": ["E", "F"] }, "depends_on": [{ "question_1": "A" }] }, |
||||
"page_4": { "questions": { "question_4: "answer_options": ["G", "H"] }, "depends_on": [{ "question_1": "B" }] }, |
||||
``` |
||||
|
||||
### JSON form validation against Schema |
||||
|
||||
To validate the form JSON against the schema you can run:\ |
||||
`rake form_definition:validate["config/forms/2021_2022.json"]` |
||||
|
||||
n.b. You may have to escape square brackets in zsh\ |
||||
`rake form_definition:validate\["config/forms/2021_2022.json"\]` |
||||
|
||||
This will validate the given form definition against the schema in `config/forms/schema/generic.json`. |
||||
|
||||
You can also run:\ |
||||
`rake form_definition:validate_all` |
||||
|
||||
This will validate all forms in directories = `["config/forms", "spec/fixtures/forms"]` |
||||
|
||||
### Improvements that could be made |
||||
|
||||
- JSON schema definition could be expanded such that we can better automatically validate that a given config is valid and internally consistent |
||||
- Generators could parse a given valid JSON form and generate the required database migrations to ensure all the expected fields exist and are of a compatible type |
||||
- The parsed form could be visualised using something like GraphViz to help manually verify the coded config meets requirements |
@ -0,0 +1,19 @@
|
||||
# Form Runner |
||||
|
||||
The form runner is composed of: |
||||
|
||||
Ruby Classes: |
||||
- A singleton form handler that instantiates an instances of each form definition (config file we have) combined with the "setup" section that is common to all forms. This is created at rails boot time. (`app/models/form_handler.rb`) |
||||
- A Form class that is the entry point for parsing a form definition and handles most of the associated logic (`app/models/form.rb`) |
||||
- Section, Subsection, Page and Question classes (`app/models/form/`) |
||||
- Setup subsection specific instances (subclasses) of Section, Subsection, Pages and Questions (`app/form/setup/`) |
||||
|
||||
ERB Templates: |
||||
- The page view which is the main view for each form page (`app/views/form/page.html.erb`) |
||||
- Partials for each question type (radio, checkbox, select, text, numeric, date) (`app/views/form/`) |
||||
- Partials for specific question guidance (`app/views/form/guidance`) |
||||
- The check answers page which is the view for the answer summary page of each section (`app/views/form/check_answers.html.erb`) |
||||
|
||||
Routes for each form page are generated by looping over each Page instance in each Form instance held by the Form Handler and defining a "Get" path. The corresponding controller method is also auto-generated with meta-programming via the same looping in `app/controllers/form_controller.rb` |
||||
|
||||
All form pages submit to the same controller method (`app/controllers/form_controller.rb#submit_form`) which validates and persists the data, and then redirects to the next form page that identifies as "routed_to" given the current case log state. |
After Width: | Height: | Size: 1.1 MiB |
After Width: | Height: | Size: 286 KiB |
After Width: | Height: | Size: 147 KiB |
After Width: | Height: | Size: 197 KiB |
@ -0,0 +1,13 @@
|
||||
# Infrastructure Metric monitoring |
||||
|
||||
We use self-hosted Prometheus and Grafana for monitoring infrastructure metrics. These are run in a dedicated Gov PaaS space called "monitoring" and are deployed as Docker images using Github action pipelines. The repository for these and more information is here: [dluhc-data-collection-monitoring](https://github.com/communitiesuk/dluhc-data-collection-monitoring). |
||||
|
||||
# Application & Performance monitoring & alerting |
||||
|
||||
For application error and performance monitoring we use managed [Sentry](https://sentry.io/organizations/dluhc-core). You will need to be added to the DLUHC account to access this. It triggers slack notifications to the #team-data-collection-alerts channel for all application errors in staging and production and for any controller endpoints that have a P95 transaction duration > 250ms over a 24 hour period. |
||||
|
||||
# Logs |
||||
|
||||
For log persistence we use a managed ELK (Elasticsearch, Logstash, Kibana) stack provided by [Logit](https://logit.io/). You will need to be added to the DLUHC account to access this. Longs are retained for 14 days with a daily limit of 2GB. |
||||
|
||||
Logs are also available from Gov PaaS directly via cli `cf logs <gov-paas-space-name> --recent`. |
@ -0,0 +1,23 @@
|
||||
# Definitions |
||||
|
||||
- **Stock owning organisation** (parent): an organisation that owns housing stock (parent). It may manage the allocation of people in and out of their accommodation, or it may contract this out to a managing agent (child). |
||||
|
||||
- **Managing agent (child)**: These are about orgs. In scenarios where one organisation owns stock and another organisation is contracted to manage the stock and tenants, the latter organisation is often called a ‘managing agent’. A managing agent is the same as a child and is the term more commonly used by data providing organisations. Parent/child is what we call them internally but is not a term that should be used for external customers. Managing agents are responsible for the allocation of people in and out of the accommodation, and/or responsible for the services provided to support those people in the accommodation (in the case of Supported Housing). |
||||
|
||||
# Permissions |
||||
|
||||
## Organisational relationships: |
||||
|
||||
Organisations that own stock can contract out the management of that stock to another organisation. This relationship is often referred to as a parent/child relationship. This is a useful analogy as a parent can have multiple children, and a child can have many parents. A child organisation can also be a parent, and a parent organisation can also be a child organisation: |
||||
|
||||
![Organisational relationships](images/organisational_relationships.png) |
||||
|
||||
The case logs that a user can see depends on their role: |
||||
|
||||
- Customer Support users can access any case log |
||||
- Data coordinators can access any case log for which the organisation they work for is ultimately responsible for, meaning they can see logs managed by a child organisation |
||||
- Data providers can only access case logs for which their organisation manages (or directly owns) |
||||
|
||||
Taking the relationships from the above diagram, and looking at which logs each user can access: |
||||
|
||||
![User log access permissions](images/user_log_permissions.png) |
@ -0,0 +1,5 @@
|
||||
# Supported housing schemes |
||||
|
||||
- **Schemes**: Groups of similar properties in the same location, intended for similar tenants with the same type of support needs, managed in the same way. As some of the information we need about a new tenancy is the same for all new tenancies in the ‘scheme’, users can set up a ‘scheme’ in the CORE system by completing the information once. In Supported Housing forms, the user just supplies the appropriate scheme. This means providers do not have to complete identical information multiple times in each CORE form. Effectively we model these as "templates" or "predefined answer sets" |
||||
|
||||
- **Management groups**: Schemes are often managed together as part of a ‘management group’. An organisation may have multiple management groups, and each management group may have multiple schemes. For Supported Housing logs, users must select the management group first, then select scheme. |
@ -0,0 +1,5 @@
|
||||
## Service |
||||
|
||||
All lettings and and sales of social housing in England need to be logged with the Department for levelling up, housing and communities (DLUHC). This is done by Local Authorities and Housing Associations, who are the primary users of this service. Data is collected via a form that runs on an annual data collection window basis. Form changes are made annually to add new questions, remove any that are no longer needed, or adjust wording or answer options etc. Each data collection window runs from 1st April to 1st April + an extra 3 months to allow for any late submissions, meaning that between April and July, two collection windows are open simultaneously and logs can be submitted for either. |
||||
|
||||
ADD (Analytics & Data Directorate) statisticians are the other primary users of the service. The data collected is transferred to DLUHCs data warehouse (CDS - consolidated data store), via nightly exports to XML which are transferred to S3 and ingested from there. CDS ingests and transforms the data, ultimately storing it in a MS SQL database and exposing it to analysts and statisticians via Amazon Workspaces. |
@ -0,0 +1,8 @@
|
||||
# Testing strategy |
||||
|
||||
- We use [RSpec](https://rspec.info/) and [Capybara](https://teamcapybara.github.io/capybara/) |
||||
- Capybara is used for our feature tests. These use the Rack driver by default (faster) or the Gecko driver (installation required) when the `js: true` option is passed for a test. |
||||
- Capybara is configured to run in headless mode but this can be toggled by commenting out `app/spec/rails_helper.rb#L14` |
||||
- Capybara is configured to use Gecko driver for JS tests as Chrome is more commonly used and so naturally more likely to be better tested but this can be switched to Chrome driver by changing `app/spec/rails_helper.rb#L13` |
||||
- Feature specs are generally written sparingly as they're also the slowest, where possible a request spec is preferred as this still tests a large surface area (route, controller, model, view) without the performance impact. They are not suitable for tests that need to run javascript or test that a specific set of UI events triggers a specific set of requests (with high confidence). |
||||
- Test data is created with [FactoryBot](https://github.com/thoughtbot/factory_bot) where ever possible |
@ -0,0 +1,13 @@
|
||||
# External Users |
||||
|
||||
The primary users of the system are external data providing organisations: Local Authorities and Private Registered Providers (Housing Associations). These have 2 main user type: |
||||
|
||||
- Data Coordinators - administrators for their own organisation, can also complete logs |
||||
- Data Providers - complete the logs |
||||
|
||||
Additionally there are Data Protection Officers (DPO) which at some organisations is a separate role, but in our codebase is modelled as an attribute of the user (i.e. a data coordinator or provider can additionally be a DPO). They are responsible for ensuring the organisation has signed the data sharing agreement. |
||||
|
||||
# Internal users |
||||
|
||||
- Customer support (helpdesk) - can administrate all organisations |
||||
- ADD statisticians - primary consumers of the data collected via CDS/DAP |
Loading…
Reference in new issue