-
Notifications
You must be signed in to change notification settings - Fork 9
Corona Overview
This overview walks you through the Corona REST endpoint APIs. It's designed to be a quick read, with frequent links to Corona wiki pages for more details. If you do nothing but read this overview you'll get an understanding of what Corona can do; if you want to actually do those things you should follow the links.
To start with, Corona assumes three job roles for individuals:
-
The developer. This person does the day to day programming against the Corona endpoints. They're a pro with Java, .NET, Ruby, or some other language, and the Corona documentation is the only exposure they have to MarkLogic.
-
The developer admin. This person controls Corona's administrative settings. For example, they adjust current query settings, any stored transformations which may be called, and index settings. They do this via Corona endpoints separate from those available to the regular developer. They do not access MarkLogic's administrative port 8001.
-
The database admin. This person installs MarkLogic, and uses port 8001 to manage forests, system uptime, and get Corona installed and started. They're the classic IT database administrator, often not a programmer.
This overview assumes you're a developer or developer admin.
Corona stores XML, JSON, text and binary documents. There are four web endpoints for CRUD operations:
Each document can have certain metadata:
-
A Name - a unique name, with a leading slash, additional slashes place the document in a directory hierarchy
-
Permissions - security rules for what roles can view and modify the document; users and roles are managed by the database administrator
-
Properties - key-value metadata for the document
-
Collections - named grouping for documents, as an alternative to directories based on the name
-
A Quality - An integer representing the intrinsic relevance of a document in a search
-
Extracted text and other metadata - For binaries, such as when inserting a JPEG the EXIF data will be extracted as metadata
At present time Corona does not allow for modification other than full document replacement. This feature is planned for the future.
Sometimes you want a full document back and sometimes just a piece. To specify a piece you provide an extra parameter on the retrieval call. For XML documents this parameter is a simplified XPath expression; for JSON documents it's a JSON path. This cuts down wire transmission overhead, especially for larger documents.
To transform the document as part of its retrieval, you can specify a parameter on the retrieval call indicating the name of a transformer that should process the document. (You can transform a document as part of the insert call as well.)
Note: Transformers are specified by name, not as code. The library of available XSLT stylesheets and XQuery modules has to have been established earlier by a "developer admin", using a separate and secured transformers management endpoint. This prohibits regular developers from invoking arbitrary code.
(If doing both a subselection and a transformation, the subselection will occur first.)
Corona includes extremely robust support for queries. Queries can run similar to a traditional database with value, scalar, and geospatial constraints; or like search engines with free-text relevance-based language-aware constraints.
Results can be sorted by relevance or (soon) by a scalar such as a date or price.
Result items can be paged (to view say 10 results at a time),
snippeted (to show a blurb containing the matching terms), and highlighted
(to bold the matching words).
A search result can include a simple description of the matching documents, or include the documents within the result as well, for efficiency in avoiding repeated calls. When fetching the documents as part of a search, the same subsetting and transformation features are available.
Corona includes three endpoints for issuing search queries:
This is a simple endpoint, for executing a quick retrieval based on a key that's equal to a certain value.
This is a user-friendly way to specify a query as a specially marked-up string similar to those used by Google. This is something a developer could pass directly from the user interface text box to the Corona back-end for execution. It accepts queries using the string query syntax.
This is a programmer-friendly way to specify a query as a set of hierarchical query constraints expressed using a JSON encoding. It accepts queries using the structured query syntax.
It's often necessary for a "developer admin" to configure some aspects of the Corona environment to facilitate effective queries. For example:
Places
A Place gives an assigned name to a set of locations in a document, either JSON keys or XML nodes. For example, RSS has a variety of formats. A single place called "title" could be created that aliases "rdf:title" (RSS 0.9), "title" (RSS 0.91 thru 2.0, no namespace), and "atom:title" (Atom 1.0) into one.
Queries can use Places to indicate where a query constraint should apply. In string queries the Place name automatically becomes a field prefix, available to the user. A user can type title:"all the king's men" and Corona will understand that the phrase has to appear in one of the locations specified by the Place "title".
There's also a special place, the place without a name, which controls the behavior of searches that aren't field constrained. This is very important because the majority of users won't type fielded constraints.
When defining a Place you can assign relevance weights to each specified location. This helps maintain high-quality relevance-sorted results.
Places are managed by the Places endpoint.
Ranges
A Ranges gives an assigned name to a location in a document, either a JSON key or XML node, that should be treated as a scalar value. For example, a "birthday" element might be assigned the type of date.
Each range creates an index in the background that enables:
-
Fast range queries on that scalar (i.e. limiting to dates between X and Y)
-
(Soon) Optimized sorting of results by that scalar (i.e. sort by date)
-
Fast extraction of the scalar's values (i.e. show birthday occurrences by month).
Range values can be assigned into named "buckets". Each bucket represents a subset of possible values for the scalar. For example, timestamps can be bucketed into days, dates can be bucketed into months, or prices can be bucketed into "Cheap" and "Expensive".
Ranges are managed by the Range endpoint and Bucketed range endpoint.
Named Queries
If a particular query is to be reused frequently, either alone or in combination with other query constraints, then for convenience and performance it can be registered as a named query. The query can then be referenced in the structured query syntax just by specifying its name. Internally a named query gets optimized so repeated calls will be fast.
Named queries are managed by the Named Query endpoint endpoint.
Namespaces
XML Namespaces are centrally managed in Corona. This allows all references to namespaced elements and attributes in Corona to simply use the namespace prefix and rely on the central management system to dictate the associated URI.
Namespaces are managed by the Namespaces endpoint.
To execute a facet query, you specify a Range name or names as well as an optional query, and Corona returns all the distinct values (or distinct bucket names) for documents matching the query, as well as the frequency count for each.
It's a fairly simple idea but it's tremendously powerful and enables accurate analytics against documents without pre-defining your dimensions. This technique is how MarkMail.org produces the facetes on the left hand side of each search result.
Facets are executed using the facet query endpoint.
Multiple REST requests can be grouped together into a singular transaction, which at the end can be atomically committed or rolled back. There's one endpoint to start a transaction, one to commit a transaction, and one to rollback a transaction. Many requests accept an optional token to indicate in what transaction the request should be placed into.
Environment variables allow a Corona administrator to adjust aspects of the Corona runtime. There are numerous built-in environment variables for things such as: setting a standard transformer to run on all inserts or fetches, enabling debug logging, and setting the default output format. The list of environment variables is not limited to the defaults; user-provided extension endpoints can access their own environment variables to adjust their own behavior. See the environment variables documentation.
Often you want to review the server's status and settings: version information, all defined Places, Ranges, Bucketed Ranges, and Namespaces, all index settings, and a summary of documents in the database.
The Server Status endpoint provides this.
JSON provides just six native datatypes: objects, arrays, numbers, strings, booleans, and nulls. Corona extends these datatypes to support Date and XML datatypes via JSON Extensions.
Corona is a work in progress. Its APIs are subject to change. It's not a coincidence that this overview is hosted in an easy-to-change wiki. We're at the stage now where we're looking for people's feedback, so please explore, and let us know what you think. You can file issues on GitHub with your bug reports or RFE ideas. You can also message "hunterhacker" on GitHub or email Jason dot Hunter at MarkLogic dot com. There is also a mailing list at http://developer.marklogic.com/mailman/listinfo/corona