Data Cleaning and Visualization

Scenario You have been provided an export from DCE’s incident response team’s security information and event management (SIEM) system. The incident response team extracted alert data from their SIEM platform and have provided a .CSV file (MLData2023.csv), with 500,000 event records, of which approximately 3,000 have been ‘tagged’ as malicious. The goal is to integrate machine learning into their Security Information and Event Management (SIEM) platform so that suspicious events can be investigated in real-time. security data. Data description Each event record is a snapshot triggered by an individual network ‘packet’. The exact triggering conditions for the snapshot are unknown. But it is known that multiple packets are exchanged in a ‘TCP conversation’ between the source and the target before an event is triggered and a record created. It is also known that each event record is anomalous in some way (the SIEM logs many events that may be suspicious). A very small proportion of the data are known to be corrupted by their source systems and some data are incomplete or incorrectly tagged. The incident response team indicated this is likely to be less than a few hundred records. A list of the relevant features in the data is given below. Assembled Payload Size (continuous) The total size of the inbound suspicious payload. Note: This would contain the data sent by the attacker in the “TCP conversation” up until the event was triggered DYNRiskA Score (continuous) An un-tested in-built risk score assigned by a new SIEM plug-in IPV6 Traffic (binary) A flag indicating whether the triggering packet was using IPV6 or IPV4 protocols (True = IPV6) Response Size (continuous) The total size of the reply data in the TCP conversation prior to the triggering packet Source Ping Time (ms) (continuous) The ‘ping’ time to the IP address which triggered the event record. This is affected by network structure, number of ‘hops’ and even physical distances.

Get Top-Notch Quality Essays TODAY !

Ready to join our block community of business leaders for four days of virtual sessions on driving developer happiness and boosting productivity?

Place Order