Tag Archives: AEM Solutions

AEM Solution: AEM Author activity reports

AEM CMS lacks so many fundamental features and one of the critical feature is author activity reporting. We can say AEM has reporting: Disk usage, user activity, page activity & workflow instances etc. But in reality none of them are useful when it comes to basic features of reporting.

In my opinion, Without ACS Commons tool, AEM as CMS hasn’t provided many capabilities except stupid touch UI. Every team has to develop so many custom solutions to support operation work. One of the example, Migrating content from one environment to another. Will talk many more issues in AEM. Let’s explore about reporting feature in AEM.

Scenarios/Need of reporting feature in CMS

Let’s say there are many websites & brands hosted in one AEM author environment and multiple content teams are putting content at the same time. Page deleting, modification etc would be normal activity for a large team. And, Team often struggles to find out who has modified their pages, deleted etc. The Biggest question is that how do we restore the content? but keeping track of normal activities is essential.

Page Activity Report Solution

AEM has reporting capabilities called page activity report. AEM Reporting lacks following basic features:

  • It is unresponsive & provides very basic information.
  • Filtering based on date, author etc isn’t provided.
  • Querying feature isn’t available.
  • No way you can check what section of the page was modified?

Solutions

AEM OOTB (Out of the box) Page activity report could be helpful if you know the page name or title and you want to track of that page. In above snapshot, Filter setting provides a way to find out about the page.

Custom Solution using PageEvent Handler

Here is the one custom solution to track all the events of the page in AEM. Keep one PageEvent handler and keep pushing activities into JCR node or other storages.

@Component
@Service
@Property(name="event.topics", value= {DamEvent.EVENT_TOPIC, PageEvent.EVENT_TOPIC})
public class PageActivityReport implements EventHandler {
    /**PageModification.ModificationType.CREATED
       PageModification.ModificationType.DELETED
       PageModification.ModificationType.MODIFIED
       PageModification.ModificationType.MOVED
       PageModification.ModificationType.VERSION_CREATED
       PageModification.ModificationType.RESTORED
    ***/    
   @Override
   public void handleEvent(Event event) {
     PageEvent pageEvent = PageEvent.fromEvent(event);
    if(pageEvent != null) {
       Iterator<PageModification> modifications = pageEvent.getModifications();
        while (modifications.hasNext()) {
            PageModification modification = modifications.next();
            if (PageModification.ModificationType.CREATED.equalsIgnoreCase(modification.getType().toString())) {
        //Log it or write code to save created pages.
        } else if (PageModification.ModificationType.DELETED.equalsIgnoreCase(modification.getType().toString())) {
        //Log it or write code to save deleted pages. Notification or alert can be triggered from here.
        }else if (PageModification.ModificationType.MODIFIED.equalsIgnoreCase(modification.getType().toString())) {
   //Log it or write code to save modified pages.
   }
   }
 }
}

In the above code, We have multiple events specific blocks to write custom reporting code. One of the way is to create records of these activities is to create simple JCR nodes for each activities. Data models for reporting could be as follows.

Path of the code could be: /page-report/<today’s data in yyyy/mm/dd>/<current time in hours>/<page-path-replace_slashwith_hyphen>/<author>

  • path: /content/abc/en/example.html
  • pageTitle: <title of the page>
  • event: delete/modify/moved
  • activityBy: who performed any activity
  • timestamps: <Format should be correct so that it can be queried>

Final Thoughts

The above solution can be implemented or scale for other types of reporting. For example, Keep tracking assets activity reporting. One Challenge in scaling this solution would be, Keeping activities records in JCR nodes and fetching them quickly. Also, Above data model needs more thoughts based on how query would look like when generating final reports.

AEM Solution: Creating AEM JCR valid & Unique name programmatically

The objective of this post is to elaborate how AEM Page Manager API is being used to create unique JCR Node or Page name. And if there is a requirement for you to create same page name programatically then what API you should be using?

Scenario

Let’s consider a scenario where article pages are being created automatically in AEM content hierarchy and some of the promotional content under those pages. And, There is a product detail page which renders promotional content based Product data.

The problem occurs when you to have determine promotional content programatically with some information at the product level. To match appropriate promotional content & render onto product detail pages isn’t straight forward.

Solution

In order to solve above problem, all you have to do is to resolve promotional content page as resource. And that is possible if Your code is able to find content page in JCR Contrent hierarchy.

Let’s consider this example. Title of promotional/Article content: “This is a dummy content article page”. The Same page created in AEM would have hierarchy like this.

/content/<your-website>/en/articles/this-is-a-dummy-content-article-page

Below example shows how you can create valid JCR Resource/Node/Page Name out of given Title of the article.

#Simple Example Class
Public class UninqueValidPageName{
 public static String getValidPageName(final String articleTitle){
     return JcrUtil.createValidName(StringUtils.trim(articleTitle), JcrUtil.HYPHEN_LABEL_CHAR_MAPPING);
}
}

Final Thoughts 

This solution seems very simple however it has significant important of solving problem. 

AEM Upgrade 6.4: Jetty, Cookies and RFC6265 Compliance

While upgrading AEM (< 6.4 Version) to AEM 6.4 version and in any use case if any servlet/component is setting a cookie with some text in Http Response than your API may fail & you may be encounter below exception in logs.

RFC6265 Cookie values may not contain character

What does this error message suggest?

Well, AEM 6.4 uses latest version of Jetty application as their servlet container. Jetty has changed their cookie policy. And policy suggests that you can’t have special chars or separator in the cookies without encoding them.

Up until now Jetty has supported Version=1 cookies defined in RFC2109 (and continued in RFC2965) which allows for special/reserved characters (control, separator, et al) to be enclosed within double quotes when declared in a Set-Cookie response header: See below example.

1Set-Cookie: foo=”bar;baz”;Version=1;Path=”/secur”

Which was added to the HTTP Response headers using the following calls.

Cookie cookie = new Cookie("foo", "bar;baz");
cookie.setPath("/secur");
response.addCookie(cookie);

Solutions to fix Cookies problem?

Let’s see below simple code snippet. Just simply encode the cookie value & decode wherever you are using it.

Cookie cookie = new Cookie("foo", URLEncoder.encode("bar;baz", "utf-8"));

How to decode in Javascript & Java?

Follow below code snippet:

#Java
URLDecoder.decode(request.getCookie("foo").getValue(), "UTF-8");

#Javascript
decodeURIComponent($.cookie("foo"));

AEM Solution: How to Clear dispatcher cache by myself?

In any web application, The caching has significant value in the overall performance of the application. But, At the same time, Developers like to test their changes & able to clear the cache frequently. The caching of AEM CMS content caching happens two places: Web Server (i.e apache) & CDN (i.e Akamai) Server. However, AEM comes with dispatcher module within the webserver to handle caching request coming from AEM author environment.

Basically, Whenever content author activates any content page path from AEM author environment, A HTTP request goes to AEM publish server to trigger an event for another HTTP push request to the dispatcher module (via web server). This dispatcher push request purge the cache of a requested path by changing the timestamp of state file. Anyway, An explanation isn’t required about how dispatcher works? We can skip that part.

AEM Cache clearing totally depends on content paths & type of the pages. For example, if you want to clear the cache of any AEM Page / Image, you can just publish the same AEM Page / Image from AEM author & cache gets refreshed provided dispatcher module is configured on the publishing server.

Problems in cache clearing

It may seem easy clearing the cache of AEM Pages & Assets. In the following cases, It is very problematic in many cases. Some of them listed here.

  • Clearing the cache of Javascript minified file. Path of the file & client libs does not match at all.
  • Clearing the cache of a content request which is a servlet path but do not existing in real content hierarchy. /bin/myapp/servlet/abc.html
  • Clearing cache of vanity url.
  • Clearing cache of url which has different path but AEM mapping helps to resolve the path. For instance, Live url is /myapp/abc/xyz.html but content hierarchies are /content/myapp/en/1/abc/xyz.html

What are the traditional solutions?

  • Ask someone who has access to login to web server & clears the cache manually. But here is the catch. How many times you can ask for it if you are testing your javascript code.
  • Run curl command to clear the cache but for this, You need to know web servers dispatcher IP/ domain etc. And if there are multiple web servers then you have to clear the cache of one server at a time.
  • Run Jenkins job which may clear all the cache. And it could be problematic if you do it in stage or prod.

Easiest Solution

All the above problems are not that bad & there are solutions to it. However, As a developer, I would like to have quick & an easy way to clear the cache by myself. The AEM dispatcher module purges the cache based on the path. And to use this feature, You can clear the cache of any file/Path/Assets. Following below steps to clear cache without anyone help.

Let’s take an example of purging the cache of your minified javascript file. Path of the file is/etc/designs/myapp/core.mini.js

  • Create a file with the same name & path.
  • Activate the same file.
  • The Dispatcher would update the cache file & start referring to your dummy file found as a new file.
  • De-activate the same dummy file right away. This is required because Your dummy file will not have correct content or code. So, Make sure you de-activate the same file again.
  • Once the file is de-activated, ClientLibs or AEM path resolution will happen as normal.
  • You can delete the same file in the author for future purpose or you can delete it.

Above solution works with any other path or file. Be it a JSON, XML, HTML etc. The only condition is that the path which you want to clear from the cache has to be created in the author first.

Finally Thoughts

These solutions are tested but can’t say that it solves all the problems. You can post your queries in the comment section. Will look into those & revert back.

AEM Security: How to secure the AEM application?

Overview

There is a set security practice followed by every development team in Adobe experience manager ( i.e AEM) CMS technology. And, Most of these are pretty straightforward suggested by the Adobe as best practices however there are many other security issues which have equal importance.

So, Let’s begin to know how to secure your application by putting right rules in your AEM environment.

All other recommendations from the open web application security project(i.e OWASP) should be applied. Below recommendations are very specific to AEM technology & AEM infrastructure.

There are many problems which are unknown to the AEM Solution provider & putting the whole thing at risk. I would like to state one of the examples here to showcase the security problems in AEM.

Use below Google Query to find out if your author instance is indexed by the google or not. I have used a very basic query in google. Try it, you would surprise to see how many author instances which are open to exploits. You might be wondering how to login in those authors. That is fairly easy once you know who has authored the pages.

Google Query: inurl:aemauthor

AEM Author Security:

First & foremost, Make sure your AEM author instance isn’t searchable by the search engine & It is not accessible outside of Intranet without VPN. Follow some author security guidelines below:

  • Keep robots.txt for all your domains including the authoring environment. make sure Google does not index author domain.
  • Enable HTTPS in AEM Author.
  • Changing Admin password in every AEM instance (i.e server).
  • Create groups for assigning access & follow the least privilege principle. Basically, Instead of denying on many hierarchies just allow what individual group needs.
  • Create a separate replication user to use in replication agent configuration. Admin should not be used for replicating anywhere.
  • Limit the number of users in admin groups.
  • Web dev, CRX explorer & CRXDE in prod author should be disabled or should be limited to certain users.

AEM Publish Security

Same as AEM author, publish instances should not be accessible to an outside of the intranet & connections to web servers, author etc should be internal connections. The most important thing to handle in publish security is to handle requests inputs & use proper request sessions. Serving requests with admin session or privileged user is a big problem. 

Assume some data you have to read & anonymous user does not have permission to that then avoid using admin session. Have a dedicated user for that to read/write the content for certain requests. Follow other guidelines respect with AEM Publish security:

  • Anonymous permissions should be checked & make sure not every directory accessible to the anonymous user. Even in etc design, There should be proper permission setup in cloud services etc.
  • Apache Sling Referrer Filter must be configured to handle unwanted publish requests.
  • The cross-site forgery framework should be enabled to filter requests.
  • All default tools (Crx explorer, Crxde, WebDev) etc should be disabled.
  • No one should be able to access publish server directly. Also should not be able to install packages directly.

Dispatcher security

When anyone thinks of AEM security, most of us just think of rules & filters in dispatcher.any configuration file. But, There are many more use cases where things are not pretty if you have not taken care of security:

  • Do not have dispatcher flush agent configured from AEM author. And if it is enabled then have https call for flushing cache. Otherwise, author flush agent exposes to your web server IP & credentials.
  • Limit the request headers information. Request headers are passed in every request to AEM publish based on dispatcher configuration.
  • Do not allow cross-origin requests. Set the SAME origin header at the web server level.
  • Proper input validation should be done in POST Requests & dispatcher filter should allow only certain POST requests.
  • Caching of selectors & URL extensions should be defined. Not every selector or extension should be cacheable. DOS or DDOS attacks are very easy to do in AEM application.
  • Website URL’s should not expose internal directories.

Final thought

We have to secure the infrastructure & security of important environments. Once you have security author, publish & proper dispatcher configuration, you would have a better chance to protect your application. Application security is another aspect follow the below links for Adobe recommendation.