This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Implications of configuring LEM to store original logs

I would like to understand the implications to my LEM environment if I were to configure it to store the original logs as per the KB article HERE.

Thanks in advance for any feedback!

  • I would assume that there is a considerable increase in storage usage as this data is stored separately in addition to the normalized data?  Any other implications?

  • I am very intrested in this as well.  I assume we would need a separate appliance to store all this...

  • I know that you can configure the same appliance to store it, I configured it in a lab once.  I am more concerned about the impact on storage and performance for the appliance... and any other impacts I have have not thought of.

  • FormerMember
    0 FormerMember

    There is some event processing overhead on the appliance, but it is mostly storage and IO. The original log store is compressed, and since it's all text data you can imagine it's relatively well-compressed. Putting the log storage on the same disk as the normalized storage just means they are both reading/writing to the same location - could slow down searching, reporting, etc. We do have the ability to configure for a secondary appliance (no additional licensing).

    They will work to coexist on the same disk, about 50/50 (this ratio is configurable on our end with assistance, if necessary), so data will rotate out of each storage platform to prevent you from running out of disk space. That could mean that you squeeze your timeframe for the normalized store too far, though you could expand the disk.

    On the event processing side, it does create additional load at the core, but usually if everything's running fine things are relatively invisible.

  • In our case even if we had it on a different appliance it would likely be using the same storage back-end so I am not sure how much benefit this would be. 

    Do you have architecture documentation you can point me to for architecting LEM solutions for different environments using these different options that you mention?  We are a service provider looking to provide both a SIEM IaaS solution as well as managed NOC and HIPAA/PCI Cloud services utilizing LEM so understanding the architecture options is going to be important for us.

    Also, for the 2nd appliance option is there documentation you can point me to on how to configure this?

    I really love hearing about all of the different ways LEM solutions can be setup but I would prefer if I had the ability to configure these things versus needing to call support and having them do it for me.

  • The LEM virtual appliance deploys, by default, with a 250GB disk.  In version 5.4, the limit to expand this was 1TB because of license restrictions for the database that was used.  In 5.6, the sky is the limit.  However, in HyperV and some versions of ESX, the limit on the size of a virtual machine is 2TB.  In ESX 5.5, this limit is now 62TB (see page 12).  LEM 5.6+ will happily expand to fill that space, meaning that you can keep more information in both the normalized alert and raw log databases.

    It's true that the LEM can be split into multiple hosts, but this is a solution that has been discouraged since the LEM went to a virtual appliance (which is easy to expand and add cores and memory to) vs the old TriGeo SIM (which was hardware), and most companies would end up hosting multiple parts on the same host with the same storage, so there's very little gained by splitting the LEM up.  As such, most (all?) of the documentation on that sort of deployment is internal only, and the recommended solution is "use a virtual machine and make it bigger."

    I know that HIPAA, SOX and other groups have different retention requirements, and different formats that they consider acceptable.  Some auditors are happy if you run Reports regularly and keep those, some want the actual database accessible.  This can be a challenge for sizing: "Can I keep 5 years of data in the database?"  Maybe.  If you're getting 10KB of logs a day, a year is 3.6MB and 250GB is a lifetime of data.  If you get millions of events a day, it adds up.

    Can the processor handle it?  This also depends: how many rules are you running? What do those rules do?  If all you're doing is taking notes, the overhead can be minimal and the LEM will handle a lot of events easily.  If you have rules firing on every alert, the processing need can be intense.  Some of that can be alleviated by throwing more resources at the LEM in your virtual manager.

  • Awesome follow-up, thanks for this.

    While I understand the "add more beef" to the system concept.  I guess my only concern would be the drive size limiting my data retention but I guess that is potentially a problem either way.  By adding the raw data-store I just reduce it even more.

  • I guess the questions to answer are "How much data do you generate in a day?" and "How many days do you need to keep?"

    You might consider multiple LEMs monitoring different segments of the network, which at least allows you to "divide and conquer" the devices you need to monitor.