Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Corona Overview

hunterhacker edited this page Dec 11, 2011 · 24 revisions

Corona Overview

This overview walks you through the Corona REST endpoint APIs. It's designed to be a quick read, with frequent links to Corona wiki pages for more details. If you do nothing but read this overview you'll get an understanding of what Corona can do; if you want to actually do those things you should follow the links.

Three Roles

To start with, Corona assumes three job roles for individuals:

  1. The developer. This person does the day to day programming against the Corona endpoints. They're a pro with Java, .NET, Ruby, or some other language, and the Corona documentation is the only exposure they have to MarkLogic.

  2. The developer admin. This person controls Corona's administrative settings. For example, they adjust current query settings, any stored transformations which may be called, and index settings. They do this via Corona endpoints separate from those available to the regular developer. They do not access MarkLogic's administrative port 8001.

  3. The database admin. This person installs MarkLogic, and uses port 8001 to manage forests, system uptime, and get Corona installed and started. They're the classic IT database administrator, often not a programmer.

This overview assumes you're a developer or developer admin.

Basic Document Storage

Corona stores XML, JSON, text and binary documents. There are four web endpoints for CRUD operations:

Each document can have certain metadata:

  • A Name - a unique name, with a leading slash, additional slashes place the document in a directory hierarchy

  • Permissions - security rules for what roles can view and modify the document; users and roles are managed by the database administrator

  • Properties - key-value metadata for the document

  • Collections - named grouping for documents, as an alternative to directories based on the name

  • A Quality - An integer representing the intrinsic relevance of a document in a search

  • Extracted text and other metadata - for binary documents

To transform the document before it is saved, you can specify another parameter on the insertion call indicating the name of a transformer (XSLT stylesheet or XQuery module) that should process the document. Today this is limited to XML documents.

At present time Corona does not allow for modification other than full document replacement. This feature is planned for the future.

Document Retrieval

Sometimes you want a full document back and sometimes just a piece. To specify a piece you provide an extra parameter on the retrieval call. For XML documents this parameter is a simplified XPath expression; for JSON documents it's a JSON path. This cuts down wire transmission overhead, especially for larger documents.

To transform the document as part of its retrieval, you can specify another parameter on the retrieval call indicating the name of a transformer (XSLT stylesheet or XQuery module) that should process the document. Today this is limited to XML documents.

You specify the XSLT by name, not as code. The library of available XSLT stylesheets has to have been established earlier by a "developer admin", using a separate and secured [content transformers] management endpoint. This prohibits regular developers from invoking arbitrary code.

(If doing both a subselection and a transformation, the subselection will occur first.)

Search Queries

Corona includes extremely robust support for queries. Queries can run similar to a traditional database with value, scalar, and geospatial constraints; or like search engines with free-text relevance-based language-aware constraints.

Results can be sorted by relevance or (soon) by a scalar such as a date or price.
Result items can be paged (to view 10 results at a time), snippeted (to show a blurb containing the matching terms), and highlighted (to bold the matching words).

A search result can include a simple description of the matching documents, or include the documents within the result as well, for efficiency in avoiding repeated calls. When fetching the documents as part of a search, the same subsetting and transformation features are available.

Corona includes three endpoints for issuing search queries:

Key/Value Query Service

This is a simple endpoint, for executing a quick retrieval based on a key that's equal to a certain value.

String Query Service

This is a user-friendly way to specify a query as a specially marked-up string similar to those used by Google. This is something a developer could pass directly from the user interface text box to the Corona back-end for execution. It accepts queries using the string query syntax.

Structured Query Service

This is a programmer-friendly way to specify a query as a set of hierarchical query constraints expressed using a JSON encoding. It accepts queries using the structured query syntax.

Search Configuration Management

It's often necessary for a "developer admin" to configure some aspects of the Corona environment to facilitate effective queries. For example:

Places

A Place gives an assigned name to a set of locations in a document, either JSON keys or XML nodes. For example, RSS has a variety of formats. A single place called "title" could be created that aliases "rdf:title" (RSS 0.9), "title" (RSS 0.91 thru 2.0, no namespace), and "atom:title" (Atom 1.0) into one.

Queries can use Places to indicate where a query constraint should apply. In string queries the Place name automatically becomes a field prefix, available to the user. A user can type title:"all the king's men" and Corona will understand that the phrase has to appear in one of the locations specified by the Place "title".

There's also a special place, the place without a name, which controls the behavior of searches that aren't field constrained. This is very important because the majority of users won't type fielded constraints.

When defining a Place you can assign relevance weights to each specified location. This helps maintain high-quality relevance-sorted results.

Places are managed by the Places endpoint.

Ranges

A Ranges gives an assigned name to a location in a document, either a JSON key or XML node, that should be treated as a scalar value. For example, a "birthday" element might be assigned the type of date.

Each range creates an index in the background that enables:

  1. Fast range queries on that scalar (i.e. limiting to dates between X and Y)

  2. (Soon) Optimized sorting of results by that scalar (i.e. sort by date)

  3. Fast extraction of the scalar's values (i.e. show birthday occurrences by month).

Range values can be assigned into named "buckets". Each bucket represents a subset of possible values for the scalar. For example, timestamps can be bucketed into days, dates can be bucketed into months, or prices can be bucketed into "Cheap" and "Expensive".

Ranges are managed by the Range endpoint and Bucketed range endpoint.

Namespaces

XML Namespaces are centrally managed in Corona. This allows all references to namespaced elements and attributes in Corona to simply use the namespace prefix and rely on the central management system to dictate the associated URI.

Namespaces are managed by the Namespaces endpoint.

Facets

To execute a facet query, you specify a Range name or names as well as an optional query, and Corona returns all the distinct values (or distinct bucket names) for documents matching the query, as well as the frequency count for each.

It's a fairly simple idea but it's tremendously powerful and enables accurate analytics against documents without pre-defining your dimensions. This technique is how MarkMail.org produces the facetes on the left hand side of each search result.

Facets are executed using the facet query endpoint.

Server Status

Often you want to review the server's status and settings: version information, all defined Places, Ranges, Bucketed Ranges, and Namespaces, all index settings, and a summary of documents in the database.

The Server Status endpoint provides this.

JSON Extensions

JSON provides just six native datatypes: objects, arrays, numbers, strings, booleans, and nulls. Corona extends these datatypes to support Date and XML datatypes via JSON Extensions.

Feedback

Corona is a work in progress. Its APIs are subject to change. It's not a coincidence that this overview is hosted in an easy-to-change wiki. We're at the stage now where we're looking for people's feedback, so please explore, and let us know what you think. You can file issues on GitHub with your bug reports or RFE ideas. You can also message "hunterhacker" on GitHub or email Jason dot Hunter at MarkLogic dot com.

Clone this wiki locally