How to make Drupal Search API index the full rendered HTML content

Posted in: Blog, Drupal |

No comments

INTRODUCTION

Drupal is somehow a selected content management system by a lot of government bodies in Western Australia. While it’s a great little system, I still feel that it needs a lot of improvements especially around the search area.

One thing that struck me was the fact that it did not index the final rendered HTML content. Yes it can index content fields fine, but it definitely does not index the final rendered HTML – which sucks in my opinion. The reason is, you would have had a lot of custom content blocks dropped onto the different content placeholders which would appear on your content pages. But, since the only content indexed is your main page content field, when you try to look for a text within your rendered custom blocks, the search results won’t return the corresponding page(s) where the blocks may be loaded to.

I’ve installed Search API and have tried using the default index or even SOLR! And yet both came to no avail. The main reason behind it is not so much about the search technology, but rather the content passed from Drupal to these search indexer does not contain the full rendered HTML content. Therefore, no matter what search technology you use, it won’t return the proper results since the content is never indexed in the first place.

RESOLUTION

Finally, after some fiddling around, I found a workable solution! And no, it’s not installing Google search but rather a modification to the search API file.

Search API introduced a search field called “Rendered Item”. You can add this field in the Fields area of your index, as well as your Search API Pages. The idea behind it is great, that is to render the actual HTML output of the field. Therefore, I hijack this field to index the actual page’s HTML output instead of just the field’s HTML output – if that makes sense?

Anyway, the Rendered Item column is contained within this file:

\modules\search_api\src\Plugin\search_api\processor\RenderedItem.php

This is the line that actually adds the field value into the index:

public function addFieldValues(ItemInterface $item) 

So, I hijack it to include the rendered page HTML. I’m using this method to download the rendered page HTML:

file_get_contents

My method now looks like below. Please look at the line in red and green.

function SiteURL()
{
$protocol = (!empty($_SERVER[‘HTTPS’]) && $_SERVER[‘HTTPS’] !== ‘off’ || $_SERVER[‘SERVER_PORT’] == 443) ? “https://” : “http://”;
$domainName = $_SERVER[‘HTTP_HOST’].’/’;
return $protocol.$domainName;
}

/**
* {@inheritdoc}
*/
public function addFieldValues(ItemInterface $item) {
$original_user = $this->currentUser->getAccount();

// Switch to the default theme in case the admin theme is enabled.
$active_theme = $this->getThemeManager()->getActiveTheme();
$default_theme = $this->getConfigFactory()
->get(‘system.theme’)
->get(‘default’);
$default_theme = $this->getThemeInitializer()
->getActiveThemeByName($default_theme);
$this->getThemeManager()->setActiveTheme($default_theme);

// Count of items that don’t have a view mode.
$unset_view_modes = 0;

$fields = $this->getFieldsHelper()
->filterForPropertyPath($item->getFields(), NULL, ‘rendered_item’);
foreach ($fields as $field) {

$idString = $item->getId() . ”;

if(strpos($idString, ‘node/’) != false)
{
$idString = str_replace(“entity:node/”, “”, $idString);
$idString = str_replace(“:en”, “”, $idString);

$html = \file_get_contents($this->SiteURL() . ‘/node/’ . $idString);
$field->addValue($html);
}
else
{
$configuration = $field->getConfiguration();

// Change the current user to our dummy implementation to ensure we are
// using the configured roles.
$this->currentUser->setAccount(new UserSession([‘roles’ => $configuration[‘roles’]]));

$datasource_id = $item->getDatasourceId();
$datasource = $item->getDatasource();
$bundle = $datasource->getItemBundle($item->getOriginalObject());
// When no view mode has been set for the bundle, or it has been set to
// “Don’t include the rendered item”, skip this item.
if (empty($configuration[‘view_mode’][$datasource_id][$bundle])) {
// If it was really not set, also notify the user through the log.
if (!isset($configuration[‘view_mode’][$datasource_id][$bundle])) {
++$unset_view_modes;
}
continue;
}
else {
$view_mode = (string) $configuration[‘view_mode’][$datasource_id][$bundle];
}

$value = (string) $this->getRenderer()->renderPlain($build);
if ($value) {
$field->addValue($value);
}

}
}

// Restore the original user.
$this->currentUser->setAccount($original_user);
// Restore the original theme.
$this->getThemeManager()->setActiveTheme($active_theme);

if ($unset_view_modes > 0) {
$context = [
‘%index’ => $this->index->label(),
‘%processor’ => $this->label(),
‘@count’ => $unset_view_modes,
];
$this->getLogger()->warning(‘Warning: While indexing items on search index %index, @count item(s) did not have a view mode configured for one or more “Rendered item” fields.’, $context);
}
}

Basically, I’m just telling Drupal, if you load a content node, download the full page HTML output and add it to the index.

if(strpos($idString, ‘node/’) != false)
{
$idString = str_replace(“entity:node/”, “”, $idString);
$idString = str_replace(“:en”, “”, $idString);

$html = \file_get_contents($this->SiteURL() . ‘/node/’ . $idString);
$field->addValue($html);
}

The addValue(string) is the method that adds any string to the index.

DONE!

Delete all item from the index, then do a full re-index 🙂

NOTE: As this is updating Search API’s system file, when you upgrade the module, ensure you re-apply the changes because it would have been overridden by the upgrade.

 

Hope this helps,

Tommy

 

Written by

A web solution expert who has passion in website technologies. Tommy has been in the web industry for more than 10 years. He started his career as a PHP developer and has now specialized in ASP.NET, SharePoint and MS CRM. During his career he has also been in many roles: system tester, business analyst, deployment and QA manager, team and practice leader and IT manager.

No Comments Yet.

Leave a Reply

You must be logged in to post a comment.

Our Services

We provides you the best Services in our themes.

  • Click on the link below to see a full list of clients which we have developed solutions and provided consultancy for.

    READ MORE

  • We are solution-centered and not application-centered.

    READ MORE

  • Being creative and having fun and yet still delivering a fantastic service is the center of our values.

    READ MORE

  • TFS Consulting Services guarantees delivery that is within budget and deadline or you engage us for free.

    READ MORE

Implementing IT does not have to be difficult.

As long as you have the right methodologies

We have heard a lot of complaints from our clients that IT a lot of the times give them headache. The issues range from over-budget implementation, server is too hard to maintain, application is not user friendly, features not complete and many others. If you have ever experienced similar situations, don’t worry. This is why TFS Consulting Services is here. We exist to help clients implementing a successful IT solution. We have various methodologies which we have proven working in delivering a successful IT implementation. Below is the list of some of our key service offerings:
  • Planning and Methodologies

    Implementing IT solution does not have to be difficult. TFS Consulting Services has a lot of resources on planning and methodologies that will ensure successful delivery of your IT solution. TFS Consulting Services has been around in the web industry for more than 10 years and has experienced all the successes and failures of various type of IT deployment.

    read more

  • Technical Resource

    Do you need a technical resource? TFS Consulting Services can also provide you with technical resource for developing ASP.NET (C# and VB.NET), SharePoint (2003, 2007, 2010, 2013) and MS CRM applications. Our resource is an Microsoft Certified Personnel (MVP) and Microsoft Certified Technology Specialist (MCTS) in all ASP.NET, SharePoint and CRM.

    read more

  • IT Consulting & Advice

    Make sure your IT implementation is robust and scalable. TFS Consulting Services can provide consulting and advice on industry’s best practice on various web-related areas such as website security, design and usability, application-specific (such as SharePoint)’s best practice, Search Engine Optimisation (SEO), coding standards and many others.

    read more

  • Solution Development

    Finally TFS Consulting Services provides you with solution development service. We mainly work with Microsoft technologies (ie. .NET and SQL Server), however we are also capable of developing with PHP and MySQL. If you ever need any business process automation, integration and solution development work,  we are the trusted expert you should go to.

    read more

For more detailed service offerings please visit our Solutions page.

Testimonials

  • I’m happy to recommend Tommy as a knowledgeable and diligent developer.

    Mike Stringfellow, Vivid Group
  • Tommy has a lot of great ideas that can be delivered into great products. It’s a pleasure working with him because he has a broad knowledge about available technologies out there and he knows what’s best for the client’s needs. He just knows how to work effectively and efficiently.

    Teddy Segoro, Student Edge
  • I’ve worked with Tommy over the past 6 months and have found his approach to development (especially SharePoint) absolutely outstanding. Tommy goes beyond the boundries of typical software development with his ability understand what a client requires and then build it into fully fledged software solution. Coupled with his professional “Best Practice” approach, you get Continue Reading

    Michael Bonham, DSC-IT

Contact us

Tommy Segoro
tommy@tfsconsulting.com.au
+61 404 457 754

   

© TFS Consulting Services 2024. All rights reserved.

www.incorporator.com.au