How to prepare data for import to Practique when uploading a CSV or Excel File

Summary

Practique enforces strict rules on the CSV/XSLS imports without attempting to recover or auto-correct input provided by the users in order to provide clear and deterministic handling of the input which in turn allows reliable outputs, data communication and exports.

Background

Practique's internal format for the textual information is HTML. HTML is then used to as the core information displayed to the users on Practique Server, Practique for iPad iOS and Practique for Browser platforms as well as exported to PDF and MS Word. This poses challenges when input from the users in the form of the CSV/XLSX files contains special characters, HTML reserved characters or invalid HTML fragments.

Practique must enforce strict, documented rules for the CSV/XLXS import content in order to be able to reliably produce desired outputs and feed data to client applications. If input is not provided in the valid format and according to the rules, it is rejected by Practique.

Practique does not attempt to auto-correct, auto-transform or infer any changes to the imports. This is important as some operations may change input, which is not desirable as changes to the input must be the responsibility of the user so that the user retains maximum control over the imported input.

Rules

I. Input must be valid XML

The source file values must be valid XML fragment which Practique validates using XML validator. If the source file contains plain text only without any HTML formatting it is necessary to encode special characters - see Rule III. If HTML markup is used for formatting it must be valid, for example (<strong>This text is bold <em>this is italic as well</strong>) is an invalid markup.

If values in source file does not pass XML validation, the source file will be rejected.

II. Input must be UTF-16, UTF-8 encoded

The source file must be, in full, encoded in UTF-16 or UTF-8 character encoding.

Practique will check the source file encoding first, trying to validate it as UTF-8. If that fails, it will try UTF-16. If that fails, the source file will be rejected.

III. Input must have HTML reserved characters (<, >, &) encoded as HTML entities (&lt;, &gt;, &amp;)

IV. Special characters must be UTF-8, UTF-16 characters

V. Formatting must be expressed using allowed HTML markup

Practique supports following HTML markup: sub, sup, em, strong, del, table[tr,td], ol, ul, li, p, br, pre, code, MathML.

Any other markup will be rejected.

Impact

Any input imported into Practique must be XML adhering to the rules above. In practice users input data in two main forms, plain text and HTML.

See table below for a demonstration of the rules. The Original column details what users may attempt to import and the Transformed column demonstrates the corrected version.


Original

Transformed (by users)

Notes

Rule
2Equation for general relativity is e=mc2Equation for general relativity is e=mc<sup>2</sup>HTML markupIV
3What are consequences of cholesterol level > 500What are consequences of cholesterol level &gt; 500Plain textIII
4How would Drag & Drop user interface be used by users?How would Drag &amp; Drop user interface be used by users?Plain textIII
5

A 50 year old man with bold head<br>

complains about leg pain.

A 50 year old man with bold head<br/>

complains about leg pain.

Valid XMLI
6

<p>This is <em>emphasis</em> and some <strong>bold</strong> text. There is also observation table bellow</p>

<table>

<tr><td>Key</td><td>Value</td></tr>

<tr><td>Key</td><td>Value</td></tr>

</table>


HTML markup, no transformation necessary, valid inputV