Automobile manufacturer
Real-time streaming of Telelog vehicle data with Kafka and Storm
Starting point
- Functional improvements are to be achieved for each market and vehicle model
- The current data does not allow any statements regarding the correlation of vehicle feature usage and the influence of external factors such as GPS, weather, passengers, or traffic
- There are no detailed usage data available for vehicle features
- No real-time data from the vehicles is available for integration
- As a result, offering services based on individual vehicle usage is not possible
Procedure
- Gathering requirements and selecting the best tool (Flume, Spark, Storm)
- Use case onboarding and HDFS Data Lake setup with a focus on tenant security
- Setting up Kafka for DEV/INT/PROD environments, as well as coordination with the producer servers
- Deployment of a Strom Trident Topology for real-time processing of Kafka topics (anonymisation, DQ, data validation)
- HDFS sink for further analysis with Hive
- System testing (stress/unit testing, performance optimisation)
Features/Project outcome
- Kerberized HDP 2.5 with functional, tested Kafka 0.10 and Storm 1.0
- Automated deployment scripts for the DEV/INT/PROD environment
- Technical prerequisites for the provision and data transmission of real-time vehicle data have been established
- Big Data reporting connection via Hive and Knox to Tableau Server is in place
- The data and analytics platform enables the data scientist to perform vehicle data analysis
Customer benefits
- Long-term and cost-effective storage of vehicle usage data in the Data Lake
- Data is anonymised on-the-fly and can be used in real-time by data scientists
- The department can conduct its own analyses (Self-Service BI)
- Real-time vehicle data analysis is possible
- Individual offers based on vehicle usage data
- The application provides tremendous resilience, ensuring that data loss is highly unlikely even in the event of multiple component failures

Marco Bruno | Senior Manager / Authorised Officer