Skip to content

Commit

Permalink
Merge pull request #78 from soee/checkMetaTagToDecideIfFileShouldBeIn…
Browse files Browse the repository at this point in the history
…dexed

[TASK] Do not exclude Sitemap path from indexing, support x-typo3-ind…
  • Loading branch information
linawolf authored Mar 26, 2024
2 parents 146107d + 9061611 commit a6a23db
Show file tree
Hide file tree
Showing 5 changed files with 91 additions and 2 deletions.
24 changes: 23 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,4 +118,26 @@ Core changelog is treated as a "sub manual" of the core manual. To index it, jus

To avoid duplicates search is indexing Core changelog only from "main" version/branch of the core documentation.
E.g. when you run ``./bin/console docsearch:import c/typo3/cms-core/main/`` then the changelog for all versions will be indexed,
but if you run `./bin/console docsearch:import c/typo3/cms-core/12.4/` the changelog will NOT be indexed.
but if you run `./bin/console docsearch:import c/typo3/cms-core/12.4/` the changelog will NOT be indexed.

Excluded and ignored files and folders
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are several files and folders that are excluded from indexing by default.
You can find them in the ``services.yml`` file in the ``docsearch`` section.

If you want to exclude more files or folders, you can add them to the ``excluded_directories`` section.

There are also specific places in the code where files or folders are ignored.

Inside the ``Manual::getFilesWithSections()`` method, the Finder is configured to ignore several files and folders.
In the same place if teh indexed packages is ``typo3/cms-core`` the ``Changelog`` folder is excluded from indexing,\
as it wil be indexed as a part of the TYPO3 core manual (``see Manual::getSubManuals()`` for more details).

Since the ``typo3/cms-core`` is a special package for core manuals, only the manuals from the ``main`` versions should be indexed.\
TO achieve this the ``DirectoryFinderService::getFolderFilter() ... isNotIgnoredPath()`` method is used.
It wil check if the processed directory is a ``/c/typo3/cms-core/'`` and if the version is not ``main``, the whole directory (other version) will be ignored.

The ``ImportManualHTMLService::importSectionsFromManual()`` method will check if the file contains.\
``<meta name="x-typo3-indexer" content="noindex">`` meta tag. If such tag exists inside the file, such file will be ignored.

2 changes: 1 addition & 1 deletion src/Dto/Manual.php
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ public function getFilesWithSections(): Finder
->in($this->getAbsolutePath())
->name('*.html')
->notName(['search.html', 'genindex.html', 'Targets.html', 'Quicklinks.html'])
->notPath(['_buildinfo', '_images', '_panels_static', '_sources', '_static', 'singlehtml', 'Sitemap']);
->notPath(['_buildinfo', '_images', '_panels_static', '_sources', '_static', 'singlehtml']);

if ($this->getTitle() === 'typo3/cms-core') {
$finder->notPath('Changelog');
Expand Down
3 changes: 3 additions & 0 deletions src/Service/ImportManualHTMLService.php
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ private function importSectionsFromManual(Manual $manual): void
$this->dispatcher->dispatch(new ManualStart($files), ManualStart::NAME);

foreach ($files as $file) {
if ($this->parser->checkIfMetaTagExistsInFile($file, 'x-typo3-indexer', 'noindex')) {
continue;
}
$this->importSectionsFromFile($file, $manual);
$this->dispatcher->dispatch(new ManualAdvance(), ManualAdvance::NAME);
}
Expand Down
16 changes: 16 additions & 0 deletions src/Service/ParseDocumentationHTMLService.php
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,22 @@ class ParseDocumentationHTMLService
{
private bool $newRendering = true;

public function checkIfMetaTagExistsInFile(SplFileInfo $file, string $name, string $content = null): bool
{
$fileContent = $file->getContents();

$selector = sprintf('meta[name="%s"]', $name);

if ($content !== null) {
$selector .= sprintf('[content="%s"]', $content);
}

$crawler = new Crawler($fileContent);
$metaTags = $crawler->filter($selector);

return (bool) $metaTags->count();
}

public function getSectionsFromFile(SplFileInfo $file): array
{
$fileContents = $file->getContents();
Expand Down
48 changes: 48 additions & 0 deletions tests/Unit/Service/ParseDocumentationHTMLServiceTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,54 @@ class ParseDocumentationHTMLServiceTest extends TestCase
{
use ProphecyTrait;

public function testMetaTagExistsByNameOnly(): void
{
$fileContent = '<meta name="x-typo3-indexer" content="index">';
$file = $this->prophesize(SplFileInfo::class);

$file->getContents()->willReturn($fileContent);
$subject = new ParseDocumentationHTMLService();
$result = $subject->checkIfMetaTagExistsInFile($file->reveal(), 'x-typo3-indexer');

$this->assertTrue($result);
}

public function testMetaTagExistsByNameAndContent(): void
{
$fileContent = '<meta name="x-typo3-indexer" content="noindex">';
$file = $this->prophesize(SplFileInfo::class);

$file->getContents()->willReturn($fileContent);
$subject = new ParseDocumentationHTMLService();
$result = $subject->checkIfMetaTagExistsInFile($file->reveal(), 'x-typo3-indexer', 'noindex');

$this->assertTrue($result);
}

public function testMetaTagDoesNotExistByName(): void
{
$fileContent = '<meta name="x-typo3-indexer" content="index">';
$file = $this->prophesize(SplFileInfo::class);
$file->getContents()->willReturn($fileContent);

$subject = new ParseDocumentationHTMLService();
$result = $subject->checkIfMetaTagExistsInFile($file->reveal(), 'x-typo3-version');

$this->assertFalse($result);
}

public function testMetaTagDoesNotExistByContent(): void
{
$fileContent = '<meta name="x-typo3-indexer" content="index">';
$file = $this->prophesize(SplFileInfo::class);
$file->getContents()->willReturn($fileContent);

$subject = new ParseDocumentationHTMLService();
$result = $subject->checkIfMetaTagExistsInFile($file->reveal(), 'x-typo3-indexer', 'noindex');

$this->assertFalse($result);
}

/**
* @test
* @throws Exception
Expand Down

0 comments on commit a6a23db

Please sign in to comment.