Category Archives: AEM 6

AEM Solution: AEM Author activity reports

AEM CMS lacks so many fundamental features and one of the critical feature is author activity reporting. We can say AEM has reporting: Disk usage, user activity, page activity & workflow instances etc. But in reality none of them are useful when it comes to basic features of reporting.

In my opinion, Without ACS Commons tool, AEM as CMS hasn’t provided many capabilities except stupid touch UI. Every team has to develop so many custom solutions to support operation work. One of the example, Migrating content from one environment to another. Will talk many more issues in AEM. Let’s explore about reporting feature in AEM.

Scenarios/Need of reporting feature in CMS

Let’s say there are many websites & brands hosted in one AEM author environment and multiple content teams are putting content at the same time. Page deleting, modification etc would be normal activity for a large team. And, Team often struggles to find out who has modified their pages, deleted etc. The Biggest question is that how do we restore the content? but keeping track of normal activities is essential.

Page Activity Report Solution

AEM has reporting capabilities called page activity report. AEM Reporting lacks following basic features:

  • It is unresponsive & provides very basic information.
  • Filtering based on date, author etc isn’t provided.
  • Querying feature isn’t available.
  • No way you can check what section of the page was modified?

Solutions

AEM OOTB (Out of the box) Page activity report could be helpful if you know the page name or title and you want to track of that page. In above snapshot, Filter setting provides a way to find out about the page.

Custom Solution using PageEvent Handler

Here is the one custom solution to track all the events of the page in AEM. Keep one PageEvent handler and keep pushing activities into JCR node or other storages.

@Component
@Service
@Property(name="event.topics", value= {DamEvent.EVENT_TOPIC, PageEvent.EVENT_TOPIC})
public class PageActivityReport implements EventHandler {
    /**PageModification.ModificationType.CREATED
       PageModification.ModificationType.DELETED
       PageModification.ModificationType.MODIFIED
       PageModification.ModificationType.MOVED
       PageModification.ModificationType.VERSION_CREATED
       PageModification.ModificationType.RESTORED
    ***/    
   @Override
   public void handleEvent(Event event) {
     PageEvent pageEvent = PageEvent.fromEvent(event);
    if(pageEvent != null) {
       Iterator<PageModification> modifications = pageEvent.getModifications();
        while (modifications.hasNext()) {
            PageModification modification = modifications.next();
            if (PageModification.ModificationType.CREATED.equalsIgnoreCase(modification.getType().toString())) {
        //Log it or write code to save created pages.
        } else if (PageModification.ModificationType.DELETED.equalsIgnoreCase(modification.getType().toString())) {
        //Log it or write code to save deleted pages. Notification or alert can be triggered from here.
        }else if (PageModification.ModificationType.MODIFIED.equalsIgnoreCase(modification.getType().toString())) {
   //Log it or write code to save modified pages.
   }
   }
 }
}

In the above code, We have multiple events specific blocks to write custom reporting code. One of the way is to create records of these activities is to create simple JCR nodes for each activities. Data models for reporting could be as follows.

Path of the code could be: /page-report/<today’s data in yyyy/mm/dd>/<current time in hours>/<page-path-replace_slashwith_hyphen>/<author>

  • path: /content/abc/en/example.html
  • pageTitle: <title of the page>
  • event: delete/modify/moved
  • activityBy: who performed any activity
  • timestamps: <Format should be correct so that it can be queried>

Final Thoughts

The above solution can be implemented or scale for other types of reporting. For example, Keep tracking assets activity reporting. One Challenge in scaling this solution would be, Keeping activities records in JCR nodes and fetching them quickly. Also, Above data model needs more thoughts based on how query would look like when generating final reports.

AEM Upgrade 6.4: Jetty, Cookies and RFC6265 Compliance

While upgrading AEM (< 6.4 Version) to AEM 6.4 version and in any use case if any servlet/component is setting a cookie with some text in Http Response than your API may fail & you may be encounter below exception in logs.

RFC6265 Cookie values may not contain character

What does this error message suggest?

Well, AEM 6.4 uses latest version of Jetty application as their servlet container. Jetty has changed their cookie policy. And policy suggests that you can’t have special chars or separator in the cookies without encoding them.

Up until now Jetty has supported Version=1 cookies defined in RFC2109 (and continued in RFC2965) which allows for special/reserved characters (control, separator, et al) to be enclosed within double quotes when declared in a Set-Cookie response header: See below example.

1Set-Cookie: foo=”bar;baz”;Version=1;Path=”/secur”

Which was added to the HTTP Response headers using the following calls.

Cookie cookie = new Cookie("foo", "bar;baz");
cookie.setPath("/secur");
response.addCookie(cookie);

Solutions to fix Cookies problem?

Let’s see below simple code snippet. Just simply encode the cookie value & decode wherever you are using it.

Cookie cookie = new Cookie("foo", URLEncoder.encode("bar;baz", "utf-8"));

How to decode in Javascript & Java?

Follow below code snippet:

#Java
URLDecoder.decode(request.getCookie("foo").getValue(), "UTF-8");

#Javascript
decodeURIComponent($.cookie("foo"));

AEM Security: How to secure the AEM application?

Overview

There is a set security practice followed by every development team in Adobe experience manager ( i.e AEM) CMS technology. And, Most of these are pretty straightforward suggested by the Adobe as best practices however there are many other security issues which have equal importance.

So, Let’s begin to know how to secure your application by putting right rules in your AEM environment.

All other recommendations from the open web application security project(i.e OWASP) should be applied. Below recommendations are very specific to AEM technology & AEM infrastructure.

There are many problems which are unknown to the AEM Solution provider & putting the whole thing at risk. I would like to state one of the examples here to showcase the security problems in AEM.

Use below Google Query to find out if your author instance is indexed by the google or not. I have used a very basic query in google. Try it, you would surprise to see how many author instances which are open to exploits. You might be wondering how to login in those authors. That is fairly easy once you know who has authored the pages.

Google Query: inurl:aemauthor

AEM Author Security:

First & foremost, Make sure your AEM author instance isn’t searchable by the search engine & It is not accessible outside of Intranet without VPN. Follow some author security guidelines below:

  • Keep robots.txt for all your domains including the authoring environment. make sure Google does not index author domain.
  • Enable HTTPS in AEM Author.
  • Changing Admin password in every AEM instance (i.e server).
  • Create groups for assigning access & follow the least privilege principle. Basically, Instead of denying on many hierarchies just allow what individual group needs.
  • Create a separate replication user to use in replication agent configuration. Admin should not be used for replicating anywhere.
  • Limit the number of users in admin groups.
  • Web dev, CRX explorer & CRXDE in prod author should be disabled or should be limited to certain users.

AEM Publish Security

Same as AEM author, publish instances should not be accessible to an outside of the intranet & connections to web servers, author etc should be internal connections. The most important thing to handle in publish security is to handle requests inputs & use proper request sessions. Serving requests with admin session or privileged user is a big problem. 

Assume some data you have to read & anonymous user does not have permission to that then avoid using admin session. Have a dedicated user for that to read/write the content for certain requests. Follow other guidelines respect with AEM Publish security:

  • Anonymous permissions should be checked & make sure not every directory accessible to the anonymous user. Even in etc design, There should be proper permission setup in cloud services etc.
  • Apache Sling Referrer Filter must be configured to handle unwanted publish requests.
  • The cross-site forgery framework should be enabled to filter requests.
  • All default tools (Crx explorer, Crxde, WebDev) etc should be disabled.
  • No one should be able to access publish server directly. Also should not be able to install packages directly.

Dispatcher security

When anyone thinks of AEM security, most of us just think of rules & filters in dispatcher.any configuration file. But, There are many more use cases where things are not pretty if you have not taken care of security:

  • Do not have dispatcher flush agent configured from AEM author. And if it is enabled then have https call for flushing cache. Otherwise, author flush agent exposes to your web server IP & credentials.
  • Limit the request headers information. Request headers are passed in every request to AEM publish based on dispatcher configuration.
  • Do not allow cross-origin requests. Set the SAME origin header at the web server level.
  • Proper input validation should be done in POST Requests & dispatcher filter should allow only certain POST requests.
  • Caching of selectors & URL extensions should be defined. Not every selector or extension should be cacheable. DOS or DDOS attacks are very easy to do in AEM application.
  • Website URL’s should not expose internal directories.

Final thought

We have to secure the infrastructure & security of important environments. Once you have security author, publish & proper dispatcher configuration, you would have a better chance to protect your application. Application security is another aspect follow the below links for Adobe recommendation.

AEM Solution: The easiest way to copy content from one AEM to another.

Moving Content in AEM is a big task regularly. In my personal opinion, it is big task for everybody. Let me try to explain in details. Let’s consider a scenario where you want to move content from one AEM environment to another. The easy thing is to do to use AEM Package manager. That is good. And just build a package from one AEM, download it & install somewhere else. Easy process? You may think it is but it is not. From the Business perspective, the Package Manager tool totally sucks & for the following reasons:

Lack of basic features in Package Manager:  There are many basic features missing. Some of them are:

  • No way you can schedule the content package as a whole. And, if 100 pages to be scheduled then Each individual pages must be scheduled to replicate them.
  • No way you could upload the individual pages content from one environment to another if individual pages are the parent pages in the content hierarchy. All the content has to be overridden.
  • Not easy to revert the certain content if installed by the package manager. either whole content or nothing can be reverted.

Not easy to use by the Non-Technical Person:  Authoring team must have a working knowledge of package manager tool. I know you might think working knowledge? My answer would be YES. Someone needs to know how to upload, build, install, download & uninstall etc. And needs access to the packages when someone can misuse it.

Time-consuming & does not work in most of the cases: Downloading from one environment & uploading in another is very old fashion & time-consuming. For heavy content like size GB’s, It does not even work. 

So, Here are the list of possible Solutions:

  • TWC Grabbit is one of them. It was developed by one of our team members however not sure if it is working in all the AEM versions. It has so many dependencies & Needs to install & managed in source & destination. But it was a quite good one.
  • AEM Package Manager Out of the box.
  • Copy whole source CRX-QUICKSTART folder & override the destination: Not a feasible option if the content has to be moved to production from stage or from stage to prod. Also not a solution if you want to move the only fewer pages or images. However, Not bad solution for Dev & QA but comes with lots of maintenance once the content is overridden.

The most easiest way move content regularly

All above solutions require some level of additional maintance however there is another the most easy solution. You need to have just two things: create a servlet in source code & Configure destination replication agent in source AEM Instance. Follow below steps to understand clearly.

Pros of this solution:

  • First, a good thing is that it is pretty easy & you can replication any JCR path. Include a content package, one page/child pages, one image/set of images. if you replicate a content package then no need to install in the destination environment. And, Helpful when you just need some pages in your QA or dev from the stage. Not whole content.
  • No dependency. No installation. Just one servlet, replication agent. And, using out of the box API. 
  • Pretty extensible. You can build fancy UI out of it & make it a tool out of it.
  • Cross-environment replication & replication only for content movement. Any environment can be a source or destination. Having a separate replication agent just for copying content does not cause any replication queue issue.
  • Cons is it is still using replication API & not any fancy third-party solution.

NOTE: I have build a tool which solves all the issue a content package has. But, not yet sure if I could simply provide source code here. However, let me know if you need some help or idea to understand the full solution.

Agent Configuration in AEM Source Instance: AEM content source is the AEM instance (author or publish) where you would be fetching content and destination where you want to upload content.

Replication authoring – Nothing different from other replication agent except Triggers configuration. Do the same as you see in the snapshot.

Hit this URL from a browser after your servlet & agent is done:  http://localhost:4502/bin/support/content/publisher?path=etc/packages/abc.zip&destEnvName=QA&publishChildNodes=true.&nbsp; publishChildNodes is required when you want to publish child nodes also.

Replication Request Handler

import com.day.cq.replication.*;
import org.apache.commons.lang3.StringUtils;
import org.apache.felix.scr.annotations.Component;
import org.apache.felix.scr.annotations.Reference;
import org.apache.felix.scr.annotations.sling.SlingServlet;
import org.apache.http.HttpStatus;
import org.apache.sling.api.SlingHttpServletRequest;
import org.apache.sling.api.SlingHttpServletResponse;
import org.apache.sling.api.resource.Resource;
import org.apache.sling.api.servlets.SlingAllMethodsServlet;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.jcr.Session;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
 * Sample URL http://localhost:4502/bin/support/content/publisher?path=etc/packages/abc.zip&destEnvName=QA&publishChildNodes=true
 */
@SlingServlet (paths = "/bin/support/content/publisher",
 methods = "GET", metatype = true, label = "Content publisher to publish content across environments")
public class PackagePublisher extends SlingAllMethodsServlet {
    private static final Logger LOGGER = LoggerFactory.getLogger(PackagePublisher.class);

    @Reference
    private Replicator replicator;
    private List<String> activatedPathsList;
    @Override
    public final void doGet(final SlingHttpServletRequest request, final SlingHttpServletResponse response) throws IOException {
  String requestPath = request.getParameter("path");
  String publishChildNodes = request.getParameter("publishChildNodes");
  final String destEnvName = request.getParameter("destEnvName");
  if (StringUtils.isNotBlank(requestPath) && StringUtils.isNotBlank(destEnvName)) {
      activatedPathsList = new ArrayList<String>();
     Session userSession = request.getResourceResolver().adaptTo(Session.class);
  ReplicationOptions replicationOptions = new ReplicationOptions();
 AgentFilter agentFilter = new AgentFilter() {
    public boolean isIncluded(Agent agent) {
 if(agent.getId().toLowerCase().contains(destEnvName.toLowerCase())) {                   return true;
                    }
                    return false;
                }
            };
            replicationOptions.setFilter(agentFilter);
            LOGGER.info("replication starting ");
            try {
                replicator.replicate(userSession, ReplicationActionType.ACTIVATE, requestPath, replicationOptions);
                Resource childResource = request.getResourceResolver().getResource(requestPath);
                if ("true".equalsIgnoreCase(publishChildNodes)) {
                       publishChildPages(childResource, userSession, replicationOptions);
                }
                for (String path: activatedPathsList){
                    LOGGER.info("Activate paths" + path );
                }
                response.setStatus(HttpStatus.SC_OK);
                response.getWriter().print("given path is replicated to given environment. Check in destination env.");
            } catch (ReplicationException e) {
                response.setStatus(HttpStatus.SC_BAD_REQUEST);
                response.getWriter().print("Check Parameters. Also check author replication agents for " + destEnvName);
                e.printStackTrace();
            }catch (Exception ex){
                response.setStatus(HttpStatus.SC_BAD_REQUEST);
                response.getWriter().print("Something was wrong!!");
            }
        } else{
            response.setStatus(HttpStatus.SC_BAD_REQUEST);
            response.getWriter().print("Parameters are not passed.");
        }
    }

    private void publishChildPages(Resource childResource, Session userSession,
                                   ReplicationOptions replicationOptions) throws ReplicationException {
             if (childResource != null) {
                Iterator<Resource> itr = childResource.listChildren();
                while (itr.hasNext()) {
                    Resource temp = itr.next();
                    if (!temp.getPath().contains("rep:policy") && !temp.getPath().contains("jcr:content")) {
                        if (temp.hasChildren()) {
                            publishChildPages(temp, userSession, replicationOptions);
                        }
                        activatedPathsList.add(temp.getPath());
                        replicator.replicate(userSession, ReplicationActionType.ACTIVATE, temp.getPath(), replicationOptions);
                    }
                 }
            }
    }
}

Final Thought

I found it very easy in day to day work when you want to move content here & their. However, if there is any confusion & question. leave a comment. will respond asap. thanks.

You can further extend this utility and have automatic script to package and transport from source instance to destination.

AEM Solution: How to override third-party OSGI Config in AEM

Overview

An overriding concept is a pretty known subject in AEM. Components can be overridden in an application with sling resolution technique. Similarly, Apache sling has a resolution order to resolve OSGi configs as well. Basically, application specific config’s can override AEM Libs configs. For more details on OSGI config & their resolution order, Read my posts here.

Scenarios/Problems

Let’s consider a scenario of a project where you have to use another AEM Project or third-party source code like ACS Commons. And, There are some services which you want to use but with different configuration. Assume services do have configurations in place.

Now you might be thinking that what is the big deal in that. We can upload third-party code & change the configuration. Well, That is the whole point. The moment you change the configuration for your application, One more maintenance issue happens. Got it? No?

No worry. The issue is that every time you update/install third-party code for your application, You have to manually change the configuration. You might think it is okay. But it is not when you have to support the same in every environment after every deployment. Someone has to fix every time. And what if the configuration has to be updated in prod publish environment. It is tough to maintain. Let me explain with config files.

#Thirdparty config’s XML File match for ContentFeed service under config.publish folder & apps folder is /apps/abc/thirdparty/. ContentFeed Service is part of platform code base.

#Base config: com.abc.core.contentFeed.xml
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
    jcr:primaryType="sling:OsgiConfig"
    contentFeedPath="https://example.com/api/core/platform"/>

#Your application would like to override this config. And apps is /apps/myapp etc.

#Brand specific config: com.abc.core.contentFeed.xml

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
    jcr:primaryType="sling:OsgiConfig"
    contentFeedPath="https://example.com/api/feed<myapp-feeds>"/>

Now, ContentFeed service does not pick your application-specific config & even though you have the same config in your application. It always picks up the third party config. And it happens ContentFeed service is part of platform source code.  I hope the problem is clear to you now. let’s see a solution.


General Solution

A general approach is to keep modifying third-party configs in every environment after each deployment. And, When QA reports this or that not working, a new person struggle to find why it is not working. Most of us do not remember that service configs are overridden by the deployment. The second approach is to override the full code itself with different service name & configurations. Must easy to maintain but having duplicate code & same bugs as third-party has. And fixing bugs every time something fixed in the third-party source.

Alternative Solution

Solution to above problem is simple. Make sure your app config file name is the same as third-party config. Read about XML config & service naming convention. AEM OSGi Config Resolution Order

To make sure Service picks your configuration, You need to create the same XML config under the same run mode folder & with additional run mode info because that is how AEM put precedence over other config’s. Let me explain with source code.

#Third-Party config file name is com.abc.core.contentFeed.xml
folder structure is: /apps/third-party/config.publish, config.author etc.

#Your app config name must be com.abc.core.contentFeed.xml
your Folder structure for config must be: /apps/third-party/config.publish.qa, /apps/third-party/config.publish.stage etc.

I hope the solution is clear to you. Solution seems easy but it solves general problem. Leave a comment if you have any question or feedback. thanks in advance.