intelmq.bots.parsers.shadowserver package¶
Submodules¶
intelmq.bots.parsers.shadowserver.config module¶
Copyright (c)2016-2018 by Bundesamt für Sicherheit in der Informationstechnik (BSI)
Software engineering by BSI & Intevation GmbH
This is a configuration File for the shadowserver parser
In the following, intelmqkey are arbitrary keys from intelmq’s harmonization and shadowkey is a column name from shadowserver’s data.
Every bot-type is defined by a dictionary with three values: - required_fields: A list of tuples containing intelmq’s field name, field
name from data and an optional conversion function. Errors are raised, if the field does not exists in data.
optional_fields: Same format as above, but does not raise errors if the field does not exist. If there’s no mapping to an intelmq field, you can set the intelmqkey to extra. and the field will be added to the extra field using the original field name. See section below for possible tuple-values.
constant_fields: A dictionary with a static mapping of field name to data, e.g. to set classifications or protocols.
The tuples can be of following format:
(‘intelmqkey’, ‘shadowkey’), the data from the column shadowkey will be saved in the event’s field intelmqkey. Logically equivalent to: event[`*intelmqkey*] = row[*shadowkey*]`.
(‘intelmqkey’, ‘shadowkey’, conversion_function), the given function will be used to convert and/or validate the data. Logically equivalent to: event[`*intelmqkey*] = conversion_function(row[*shadowkey*)]`.
(‘intelmqkey’, ‘shadowkey’, conversion_function, True), the function gets two parameters here, the second one is the full row (as dictionary). Logically equivalent to: event[`*intelmqkey*] = conversion_function(row[*shadowkey*, row)]`.
(‘extra.’, ‘shadowkey’, conversion_function), the data will be added to extra in this case, the resulting name is extra.[shadowkey]. The conversion_function is optional. Logically equivalent to: event[extra.`*intelmqkey*] = conversion_function(row[*shadowkey*)]`.
(False, ‘shadowkey’), the column will be ignored.
Mappings are “straight forward” each mapping is a dict of at least three keys:
required fields: the parser will work this keys first.
optional fields: the parser will try to interpret these values. if it fails, the value is written to the extra field
constant fields: Some information about an event may not be explicitly stated in a feed because it is implicit in the nature of the feed. For instance a feed that is exclusively about HTTP may not have a field for the protocol because it’s always TCP.
The first value is the IntelMQ key, the second value is the row in the shadowserver csv.
- Reference material:
when setting the classification.* fields, please use the taxonomy from the Data Harmonization Classification or upstream from https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/
please respect the Data harmonization ontology: Data Harmonization
- TODOs:
There is a bunch of inline todos. Most of them show lines of code were the mapping has to be validated
@ Check-Implementation Tags for parser configs. dmth thinks it’s not sufficient. Some CERT-Expertise is needed to check if the mappings are correct.
feed_idx is not complete.
-
intelmq.bots.parsers.shadowserver.config.
add_UTC_to_timestamp
(value)¶
-
intelmq.bots.parsers.shadowserver.config.
convert_bool
(value)¶
-
intelmq.bots.parsers.shadowserver.config.
convert_date
(value)¶
-
intelmq.bots.parsers.shadowserver.config.
convert_float
(value)¶ Returns an float or None for empty strings.
-
intelmq.bots.parsers.shadowserver.config.
convert_http_host_and_url
(value, row)¶ URLs are split into hostname and path. The column names differ in reports. Compromised-Website: http_host, url Drone: cc_dns, url IPv6-Sinkhole-HTTP-Drone: http_host, http_url Microsoft-Sinkhole: http_host, url Sinkhole-HTTP-Drone: http_host, url With some reports, url/http_url holds only the path, with others the full HTTP request.
-
intelmq.bots.parsers.shadowserver.config.
convert_int
(value)¶ Returns an int or None for empty strings.
-
intelmq.bots.parsers.shadowserver.config.
get_feed_by_feedname
(given_feedname)¶
-
intelmq.bots.parsers.shadowserver.config.
get_feed_by_filename
(given_filename)¶
-
intelmq.bots.parsers.shadowserver.config.
invalidate_zero
(value)¶ Returns an int or None for empty strings or ‘0’.
-
intelmq.bots.parsers.shadowserver.config.
set_tor_node
(value)¶
-
intelmq.bots.parsers.shadowserver.config.
validate_fqdn
(value)¶
-
intelmq.bots.parsers.shadowserver.config.
validate_ip
(value)¶ Remove “invalid” IP.
-
intelmq.bots.parsers.shadowserver.config.
validate_network
(value)¶
-
intelmq.bots.parsers.shadowserver.config.
validate_to_none
(value)¶
intelmq.bots.parsers.shadowserver.parser module¶
Copyright (C) 2016 by Bundesamt für Sicherheit in der Informationstechnik Software engineering by Intevation GmbH
This is an “all-in-one” parser for a lot of shadowserver feeds. It depends on the configuration in the file “config.py” which holds information on how to treat certain shadowserverfeeds. It uses the report field extra.file_name to determine which config should apply, so this field is required.
This parser will only work with csv files named like 2019-01-01-scan_http-country-geo.csv.
- Optional parameters:
- overwrite: Bool, default False. If True, it keeps the report’s
feed.name and does not override it with the corresponding feed name.
feedname: The fixed feed name to use if it should not automatically detected.
-
intelmq.bots.parsers.shadowserver.parser.
BOT
¶ alias of
intelmq.bots.parsers.shadowserver.parser.ShadowserverParserBot
-
class
intelmq.bots.parsers.shadowserver.parser.
ShadowserverParserBot
(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶ Bases:
intelmq.lib.bot.ParserBot
-
csv_params
= {'dialect': 'unix'}¶
-
feedname
= None¶
-
init
()¶
-
mode
= None¶
-
parse
(report)¶ A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
-
parse_line
(row, report)¶ A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
-
recover_line
(line: str)¶ Converts dictionaries to csv. self.csv_fieldnames must be list of fields.
-
sparser_config
= None¶
-
intelmq.bots.parsers.shadowserver.parser_json module¶
Shadowserver JSON Parser
SPDX-FileCopyrightText: 2020 Intelmq Team <intelmq-team@cert.at> SPDX-License-Identifier: AGPL-3.0-or-later
-
intelmq.bots.parsers.shadowserver.parser_json.
BOT
¶ alias of
intelmq.bots.parsers.shadowserver.parser_json.ShadowserverJSONParserBot
-
class
intelmq.bots.parsers.shadowserver.parser_json.
ShadowserverJSONParserBot
(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶ Bases:
intelmq.lib.bot.ParserBot
Shadowserver JSON Parser
- feedname: str
The name of the feed
-
feedname
= None¶
-
get_value_from_config
(data, entry)¶ Given a specific config, get the value for that data based on the entry
-
init
()¶
-
parse
(report)¶ A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
-
parse_line
(line: Any, report: intelmq.lib.message.Report)¶ A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
-
recover_line
(line: dict)¶ Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse.
-
reporttype_fn
= None¶
-
sparser_config
= None¶