diff --git a/src/assets/screenshots/enterprise/project-management/sources/files_context_menu_settings.png b/src/assets/screenshots/enterprise/project-management/sources/files_context_menu_settings.png new file mode 100644 index 00000000..2a177590 Binary files /dev/null and b/src/assets/screenshots/enterprise/project-management/sources/files_context_menu_settings.png differ diff --git a/src/content/docs/crowdin/project-management/sources/custom-segmentation.mdx b/src/content/docs/crowdin/project-management/sources/custom-segmentation.mdx index 276f4c64..f4f9022e 100644 --- a/src/content/docs/crowdin/project-management/sources/custom-segmentation.mdx +++ b/src/content/docs/crowdin/project-management/sources/custom-segmentation.mdx @@ -8,6 +8,7 @@ sidebar: import { Steps, Aside, LinkCard, CardGrid } from '@astrojs/starlight/components'; import { Icon } from 'astro-icon/components'; import { Image } from 'astro:assets'; +import Include from '~/components/Include.astro'; import fileContextMenuSettings from '!/crowdin/project-management/sources/files_context_menu_settings.png'; import fileParserConfiguration from '!/crowdin/project-management/sources/files_parser_configuration.png'; @@ -37,46 +38,7 @@ After you save your new segmentation rules, your source file will be automatical A typical SRX file looks similar to the following: -```xml - - -
- - - -
- - - - - - ^\s*[0-9]+\. - \s - - - \n - - - [\.\?!]+ - \s - - - - - - - - - - - - - -
-``` + ### Change Sentence Separator for Asian Languages diff --git a/src/content/docs/enterprise/project-management/sources/custom-segmentation.mdx b/src/content/docs/enterprise/project-management/sources/custom-segmentation.mdx index 0456fadc..5446f1db 100644 --- a/src/content/docs/enterprise/project-management/sources/custom-segmentation.mdx +++ b/src/content/docs/enterprise/project-management/sources/custom-segmentation.mdx @@ -5,4 +5,91 @@ sidebar: order: 5 --- -https://support.crowdin.com/enterprise/custom-segmentation/ +import { Steps, Aside, LinkCard, CardGrid } from '@astrojs/starlight/components'; +import { Icon } from 'astro-icon/components'; +import { Image } from 'astro:assets'; +import Include from '~/components/Include.astro'; +import fileContextMenuSettings from '!/enterprise/project-management/sources/files_context_menu_settings.png'; +import fileParserConfiguration from '!/enterprise/project-management/sources/files_parser_configuration.png'; + +Each time you upload XML, HTML, MD, or any other source files without a key-value structure, the predefined segmentation rules (SRX 2.0) are used for automatic content segmentation. Although, there might be situations when the default segmentation rules segment source files in contrast to the desired expectations. + +In this case, you can define your own segmentation rules for each source file individually using the [SRX 2.0 standard](https://www.gala-global.org/srx-20-april-7-2008). + +## Change Segmentation + +You can change segmentation in **Sources > Files**. + + + 1. Open the project where you’d like to adjust the segmentation rules and go to **Sources > Files**. + 1. Click (or right-click) on the needed file and select **Settings**. File context menu settings + 1. In the appeared dialog, switch to the **Parser configuration** tab. + 1. In the **Excluded elements** field, specify all elements that should not be imported. + 1. Select **Enable content segmentation** and **Use custom segmentation rules**. + 1. Paste your SRX segmentation rules and click **Save**. File parser configuration + + +After you save your new segmentation rules, your source file will be automatically reimported and segmented according to these new rules. + +## Segmentation Examples + + + +A typical SRX file looks similar to the following: + + + +### Change Sentence Separator for Asian Languages + +Usually, the full stop is used as a sentence separator. Although, for some Asian languages, it's not the case. For example, the typical sentence separator in Chinese is an ideographic full stop (`。`). For such cases, you may want to use the following ruleset: + +```xml + + [\x3002]+ + + +``` + +### Break Text into Smaller Parts + +In the following simple sentence, we'll break down a case when segmenting one text piece into two (or more) strings is necessary. + +Text with default segmentation rules: + +`This is the first part of the sample sentence and this is the second part.` + +Text with new segmentation rules: + +`This is the first part of the sample sentence` + + and this is the second part. + +For this particular case, the following ruleset will break the initial sentence into two parts: + +```xml + + sentence + \u0020 + +``` + +## Create Segmentation Rules with SRX Editors + +The SRX segmentation rules can be created and maintained with the help of tools like [Ratel](http://okapiframework.org/wiki/index.php?title=Ratel). It has a visual interface where you can generate segmentation rules from scratch or edit your existing ones. + + + + + diff --git a/src/content/includes/srx-file-example.mdx b/src/content/includes/srx-file-example.mdx new file mode 100644 index 00000000..5e062de2 --- /dev/null +++ b/src/content/includes/srx-file-example.mdx @@ -0,0 +1,40 @@ +```xml + + +
+ + + +
+ + + + + + ^\s*[0-9]+\. + \s + + + \n + + + [\.\?!]+ + \s + + + + + + + + + + + + + +
+``` \ No newline at end of file