Government, telecommunications, healthcare, energy and utilities, finance, insurance and automotive all have different challenges and requirements. However, all industries are facing unlimited potential to harvest all data, all the time. Stream Computing analyzes data in motion for immediate and accurate decision making
This presentation is an introduction to InfoSphere Streams. First, we position current market challenges in the area of big data. Then we discuss how context-aware stream computing from IBM InfoSphere Streams addresses these challenges. Finally we present how InfoSphere Streams provides unique value across a range of industries. You can get started now with our InfoSphere Streams Quick Start program and new open source project.
Quick Start: http://www-01.ibm.com/software/data/infosphere/streams/quick-start/
Open Source: https://github.com/IBMStreams
Clients need to move from data management to action based on real-time insight. Speed isn’t just about how fast data is produced or changed, BUT the speed at which data must be received, understood, and processed. This presentation will outline how to harness fast moving data inside and outside of your organization.
Your organization needs to shift from management of data to action. Organizations should:
Select valuable data and insights to be stored for further processing
Process and analyze perishable data to take real-time action
Harness and process streaming data such as video, acoustic, thermal, geospatial or sensors
InfoSphere Streams is a development platform using a scale-out architecture. It includes comprehensive tools for development and management of the environment. The development environment also includes a set of toolkits that provide high-level functionality to accelerate development of solutions.
Since InfoSphere Streams processes data in memory, it has high velocity – it can respond to events in microseconds, 1/1000 of a millisecond. It is orders of magnitude faster than databases, which must first store data on disk drives. InfoSphere Streams can analyze and correlate any type of data (Variety)– audio, video, network logs, sensors, social media such as Twitter, in addition to structured data. InfoSphere Streams is designed to scale to process any size of data from terabytes to zetabytes per day. InfoSphere Streams can run a large variety of analytics – from historic analysis like data mining, to predictive analytics and also custom analytics such as image analysis, voice recognition, etc. InfoSphere Streams also provides tremendous agility. With the ability to dynamically added new applications that can tap into existing data streams and applications, businesses can respond more quickly to a changing world.
What is InfoSphere Streams?
Platform: InfoSphere Streams is not a solution or application, nor is it a limited-purpose tool. Instead, it is a platform. It comes with the tools, language, and building blocks that let you build programs for it, and with a runtime environment that lets you run those programs.
Real-time: The InfoSphere Streams programs you create do their processing and analysis in as close to real-time as it is possible to get on a standard IT platform. In this case, real-time means very low latency, where latency is the delay from the time a packet of data arrives to the time the result is available. A key factor here is that InfoSphere Streams does everything in memory; it has no concept of mass storage (disk).
Analytics: Because InfoSphere Streams is fast, scalable, and programmable, the kinds of analysis you can apply ranges from the simple to the extremely sophisticated. You are not limited to simple averages or if-then-else rules.
BIG data: Actually, make that infinite data. For purposes of program and algorithm design, streaming data has no beginning and no end and is therefore by definition infinite in volume. In practical terms, this means that InfoSphere Streams can process any kind of data feed, including those that would be much too slow or expensive to capture and store in their entirety.
For more: http://www-01.ibm.com/software/data/infosphere/stream-computing/smarter-governments.html
Rapid urbanization, increasing strain on natural resources, citizen security and evolving global terror threats are the reality. However, tough problems also create new pathways for progress. By delivering real-time analytic processing on constantly changing data in motion, InfoSphere Streams is part of the solution. InfoSphere Streams enables predictive analytics of data in motion for real-time decisions allowing governments to capture and analyze data - all the time, just in time.
IBM teams with Brocade for better network security: http://www.ibm.com/software/businesscasestudies/en/us/software?docid=JHUN-95A6FW
Other areas where InfoSphere Streams helps:
Four ways InfoSphere Streams helps governments protect natural resources:
Management of wildfire risk: Analyze smoke patterns in real time via live video and pictorial feeds from satellite and unmanned surveillance vehicles. Provide safety officials with a real-time assessment of the fire, allowing them to make more informed decisions on public evacuations and health warnings.
Predictions of water quality and flow patterns: Visualize the movement of chemical constituents, monitor water quality and protect as well as analyze behavior of fish and marine mammal species as they migrate. All are key in providing a better scientific understanding of river and estuary ecosystems.
Security for electric grid: With utilities moving towards smart meter technology and use of sensors along transmission lines, there is a growing communication network infrastructure over the existing physical electric transmission infrastructure. Analyze real-time events across multiple layers of the network (IDS, firewalls etc) to predict cyber attacks and discover new threats as early as possible.
Protection of the energy grid from solar storms: Analyze data from sensors that track high frequency radio waves to protect citizens.
Three ways InfoSphere Streams helps governments create healthier citizens:
Better commuting options: Gather information from global positioning system (GPS) devices in taxi cabs and other vehicles in conjunction with data from delivery trucks, traffic sensors, transit systems, pollution monitors and weather information to provide real-time information on traffic flow and travel times.
Real-time disease outbreak detection: Perform scoring on information coming across to the central health monitoring agencies to alert authorities on any outbreak of dangerous diseases or conditions.
Smarter healthcare in intensive care units: Predict the potential onset of harmful conditions in ICU patients by running continuous analytics on physiological streams of sensor data from patients.
This slide presents an enterprise architecture for cyber security. First, the security analyst establishes base models of expected behavior/action for the enterprise. For example, the usual network traffic patterns for the web applications. The models are then deployed in IBM SPSS and enhanced with new sources of real-time streaming data. If a deviation from the expected models is discovered, alerts are displayed through Cognos (or any other visualization platform of choice) to enable the right action by the security analyst. In the event of an attack, such as a botnet, the security analyst is able to reconstruct the attack and add it quickly to the base models. Thus IBM enables real-time learning. Another unique differentiator is the ability to spot patterns and do correlations in real-time on unconventional data types as DNS logs. Most cyber security solutions are designed to protect against the known threat. The IBM approach is to deliver an architecture that enables learning and dynamic action and the ability to predict the next attack. This is possible with the real-time analytics built into InfoSphere Streams.
Terms to know:
ASN Databases - Autonomous System Numbers
Domain Name System (DNS)
Packet - A network packet is a formatted unit of data carried by a packet-switched network. Computer communications links that do not support packets, such as traditional point-to-point telecommunications links, simply transmit data as a bit stream.
In the field of computer network administration, pcap (packet capture) consists of an application programming interface (API) for capturing network traffic. Unix-like systems implement pcap in the libpcap library; Windows uses a port of libpcap known as WinPcap.
For more: http://www-01.ibm.com/software/data/infosphere/stream-computing/smarter-healthcare.html
Healthcare worldwide is in crisis - high costs, poor or inconsistent quality, and inaccessibility are potentially catastrophic. While there is no limit to the amount of data continuously being generated in provider organizations, some lack a way to analyze and correlate the data in real time. InfoSphere Streams enables predictive analytics of data in motion for real-time decisions allowing healthcare providers to capture and analyze data - all the time, just in time. The end goal is to save lives, shorten hospital stays and build healthier communities revolving around preventative care.
Three ways InfoSphere Streams helps to save lives:
Fusing different data sources in real time: Medical devices provide visual displays of vital signs through physiological streams such as electrocardiogram (ECG), heart rate, blood oxygen saturation (SpO2), and respiratory rate. Electronic health record initiatives around the world create more sources of medical data. Life-threatening conditions such as nosocomial infection, pneumothorax, intraventricular hemorrhage and periventricular leukomalacia can be detected using analytics that fuse different data sources.
Highly personalized care: Detect signs earlier to improve patient outcomes and reduce length of stays. Automated or clinician-driven knowledge discovery to indentify new relationships between data stream events and medical conditions.
Proactive treatment: Build a profile for each patient based on personalized data streams and receive insights in real time.
Hospital for Sick Kids creates first of a kind technology to help doctors care for premature babies http://www.ibm.com/software/success/cssdb.nsf/CS/SSAO-8BQ2D3?OpenDocument&Site=software&cty=en_us
UCLA tackles brain trauma to build proactive treatments during critical periods https://www.youtube.com/watch?v=bmT6i-fQLck
Emory University Hospital creates ICU of the future by analyzing over 100,000 real-time data points per second to sense early warning signs of medical complications http://www.youtube.com/watch?v=DgQheTHM5II
This slide presents an overview architecture for improving patient care. On the right side, we see various input sources such as heart monitors, respiratory monitors, blood flow monitors, brain wave activity monitors and much more. Together, these devices are streaming up to millions of events per second. Each device has its own alerting threshold. It’s a challenge for healthcare providers to know when and how to act in a sea of alarms. There are many business partners, such as those listed on this slide (Moberg, Cerner, Airstrip) that manufacture a single hardware appliance to aggregate all monitoring systems. InfoSphere Streams ingests data from the partner appliance and performs real-time analytics. A few examples of these analytics, such as oxygen saturation level, are listed on this slide.
Many healthcare institutions, such as University College Cork and the University of Montana, have PhD researchers who develop algorithms for mining healthcare data. InfoSphere Streams run these algorithms at top speed. The result is that patterns are spotted sooner so high risk patients can be attended to before the onset of a threating condition. In addition, a “super” alarm can now be established vs. many single alarms constantly going off from hundreds of patients. Results of the analytics are displayed on a wide variety of visualization platforms such as those partners listed here.
Definition of healthcare terminology:
O2 = Oxygen. Oxygen saturation is a term referring to the concentration of oxygen in the blood. The human body requires and regulates a very precise and specific balance of oxygen in the blood
HRV = Heart rate variability (HRV) is the physiological phenomenon of variation in the time interval between heartbeats. It is measured by the variation in the beat-to-beat interval.
Sepsis = Sepsis is more commonly called Blood Poisoning. Sepsis is a potentially life-threatening complication of an infection. Sepsis occurs when chemicals released into the bloodstream to fight the infection trigger inflammatory responses throughout the body. This inflammation can trigger a cascade of changes that can damage multiple organ systems, causing them to fail.
AFID = Atrial fibrillation or flutter is a common type of abnormal heartbeat. The heart rhythm is fast and irregular in this condition.
As discussed, business imperatives require a real-time response/action based on analyzing all available data continuously. This is challenging especially given that many data sources such as GPS data are constantly changing and are very bursty. To meet requirements four common tactics are often deployed, but they fall short.
Let’s take an example of a telecommunications provider to understand why these tactics fall short. Telecommunications providers need to improve network quality, prevent dropped calls and improve client satisfaction in real time. However, it isn’t always cost-effective or practical to store and then analyze all enterprise data in a data warehouse or Hadoop system. Let’s walk through an example for telecommunications to understand how each technique falls short.
Telecommunications providers need to:
Harness and process streaming data sources such as geospatial position and network devices
Continuously analyze and connect different silos of information such as client payment history, geospatial position and network health
Select valuable data and insights to be stored for further processing
Quickly process and analyze perishable data, and take timely action
Each approach outlined on the chart handles a part of the challenge, but not all requirements are addressed.
Business rules use logic, if, then, else scenarios; but deep analytics are required
Analytic silos provide limited value, its more interesting to understand each analytic in context of the others, for example does usage history relate to payment history?
Real-time analytic solutions in house are expensive and less sophisticated, most organizations don’t have statisticians in house, what about analyzing video, images or sound? Does your organization have this expertise?
Expanding the data warehouse means throwing more data at the problem, without context this isn’t helpful. Also, can you afford the time it takes to govern a wide variety of data types?
Data loss/leak prevention systems are designed to detect potential data breach / data ex-filtration transmissions and prevent them by monitoring, detecting and blocking sensitive data while in-use (endpoint actions), in-motion (network traffic), and at-rest (data storage). In data leakage incidents, sensitive data is disclosed to unauthorized personnel either by malicious intent or inadvertent mistake. Such sensitive data can come in the form of private or company information, intellectual property (IP), financial or patient information, credit-card data, and other information depending on the business and the industry.
The goal is to detect and prevent unauthorized attempts to copy or send sensitive data, intentionally or unintentionally, without authorization.
Tradition security solutions classify certain information as sensitive, using techniques such as exact data matching, structured data fingerprinting, statistical methods, rule and regular expression matching, and encryption. However, emerging big data types such as mobile data can’t be properly monitored using tradition solutions from vendors.
InfoSphere Streams enables organizations to intercept, analyze and monitor all data at high speeds, millions of data points per second, and then triggering alerts or alarms when sensitive data is accessed or leaked inappropriately.
For more: http://www-01.ibm.com/software/data/infosphere/stream-computing/smarter-automotive.html
Press release (English): http://www-03.ibm.com/press/uk/en/pressrelease/43511.wss
Press release (French): http://www.ibm.com/press/fr/fr/pressrelease/43505.wss
Increased globalization, sophisticated consumers demanding more innovative and sustainable vehicles, self driving and connected cars, and growing regulatory and environmental requirements are putting unprecedented pressure on existing business and manufacturing models. In fact, some plug-in hybrid vehicles generate 25 GB of data in just one hour. The automotive industry is predicted to be the second largest generator of data by 2015. InfoSphere Streams can help transform the industry by enabling predictive analytics of data in motion for real-time decisions allowing the automotive industry and its ecosystem to capture and analyze data - all the time, just in time.
Four ways InfoSphere Streams is transforming the automotive industry:
More profitable aftermarket for services and products: Create targeted offers based on driving preferences such as sound systems, child safety equipment and entertainment.
More interactive and safer driving experience: Deploy breaks automatically, operate windshield wipers dynamically, deploy airbags based on weight of passengers or send offers for near by businesses.
Integrated vehicle data for collaboration: Share data across third parties such as insurance companies, retailers and emergency medical services.
Improved quality and functionality of products: Detect problems sooner, predict breakdowns, and ensure parts are in stock to keep clients satisfied.
IBM teams with Continental to deliver the next generation driving experience: http://www.ibm.com/press/us/en/pressrelease/41922.wss
Automaker improves safety using real-time analysis of weather-based data or road-congestion alerts, watch the solution in action: http://m2m.demos.ibm.com/connectedCar.html
This slide depicts the future of transportation. Connected vehicles are truly big data machines on wheels. Modern vehicles generate a myriad data: car speed, weather conditions, road status, geospatial positioning, fuel levels, tire pressure and more.
Connected cars have the potential to provide a personalized driving experience. Here’s another example. A person runs errands after work. It is useful to know the optimal path between the office and top retail locations, given the time of day traffic and weather conditions.
InfoSphere Streams can ingest data directly from cars and trucks or from an appliance such as IBM MessageSight which aggregates data from connected vehicles (or other IoT data.) Data is transported via the MQTT protocol. MQTT is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium.
Upon ingestion, the data is analyzed in real-time using sophisticated techniques like spatiotemporal and predictive analytics. Analysts from insurance firms or the automakers can watch this data and respond with the right parts to stock in the case of a break down or provide automatic discounts for intelligent drivers. After the real-time analysis is completed, the data and results can then be sent to a landing zone such as Hadoop. Using InfoSphere BigInsights, organizations to build custom applications using data from connected vehicles.
InfoSphere Streams enables analytics for these and other use cases, empowering a broad ecosystem including car manufacturers,
retailers, insurance companies, trucking companies and consumers to be safer and be more productive on the road.
Real-time analytics are used both during and after the manufacturing process to achieve exceptional outcomes, including:
• Profitable aftermarket services and products
• Improved, interactive driving experience and safety by real-time analysis of weather-based data or road-congestion alerts
• Integrated vehicle data available to third parties such as insurance companies, retailers and emergency medical services
• Improved quality and functionality of future products
• Optimization of the global value chain to improve the environment
For more: http://www-01.ibm.com/software/data/infosphere/stream-computing/smarter-telco.html
Fueled by rapid adoption in developing countries, mobile communications have become the industry's highest priority and are fueling rapid changes. The rapid emergence of smart phones and 3G/4G networks has resulted in wide spread SMS usage, cell phone based internet access and more wireless phone calls. The influx of data could be overwhelming, but smart telecommunications providers are turning this data into actionable insight. InfoSphere Streams enables predictive analytics of data in motion for real-time decisions allowing telecommunications to capture and analyze data - all the time, just in time.
Four ways InfoSphere Streams helps telecommunications providers keep pace:
Processing of call data in real time to predict customer churn and fraud: Process CDRs and IPDRs to predict and prevent customer churn proactively and help filter SMS spam and SMS fraud in real-time.
Timely marketing promotions and ability to analyze success in real-time: Trigger promotions to a selected set of customers within the subscriber list based on a predefined set of business rules. Determine the success of promotions within minutes and take necessary corrective actions.
High utilization of expensive network assets: Initiate region specific real-time marketing promotions to ensure better utilization of expensive network infrastructure equipment. Understand geospatial location of the callers and target them effectively.
Incremental revenue from newer marketing promotions: Provide a platform to run powerful geospatial analytics on subscribers to better understand their location patterns and cross sell / upsell additional services and promotions from partner vendors.
Sprint accesses and analyzes call, internet usage and texting detail records in real-time http://www.youtube.com/watch?feature=player_embedded&v=eg8KSLAZ2HM
An Indian telecommunications provider reduces processing time from 12 hrs to 1 min and now analyzes 7B CDR/day
Consolidated Communications uses predictive insights to save $300,000 USD/year http://www.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_IM_EZ_USEN&htmlfid=IMC14842USEN&attachment=IMC14842USEN.PDF
Telecommunications data contains insight into outages and the events that precipitate those outages. Customers in the telecommunications industry face the challenge of performing real-time mediation and analytics on large volumes of Call Detail Records (CDR). IBM Accelerator for Telecommunications Event Data Analytics offers these customers a grammar-based parser generator and a reliable file processing system that prevents data loss and duplication. These features enable customers to import and analyze raw telecommunications data in real time, and then transform that data into meaningful and actionable insight.
In addition to a master script that starts, stops, and controls IBM Accelerator for Telecommunications Event Data Analytics, a typical IBM Accelerator for Telecommunications Event Data Analytics workflow consists of: Importing data files (the CDRs)
Scanning and parsing the input files
Extracting, enriching, and transforming the files
Removing duplicate CDRs
Either aggregating the data for statistics or writing the CDRs to a repository.
Telecommunications event data is increasing in volume, and as a service provider, you must quickly identify and resolve network quality issues to maintain high service levels and subscriber experience and to increase profits. IBM Accelerator for Telecommunications Event Data Analytics enables you to perform real-time mediation and analysis on large volumes of Call Detail Records, or CDRs, and event detail records. IBM Accelerator for Telecommunications Event Data Analytics is designed to handle exponential growth in traffic and allows you to use your current telecommunications-related service assets to support deep and critical insights to support these business goals.
Terms to know:
Abstract Syntax Notation One (ASN.1) Abstract Syntax Notation One (ASN.1) is a standard and notation that describes rules and structures for representing, encoding, transmitting, and decoding data in telecommunications and computer networking. The formal rules enable representation of objects that are independent of machine-specific encoding techniques.
This chart provides a very detailed architecture for the next best offers using telecommuniations data. InfoSphere Streams fits into this picture by analyzing high volume, high velocity data; it acts as a pre-processing filter to various landing zones.
Real-time marketing is marketing performed "on-the-fly" to determine an appropriate or optimal approach to a particular customer at a particular time and place. It is a form of market research inbound marketing that seeks the most appropriate offer for a given customer sales opportunity, reversing the traditional outbound marketing (or interruption marketing) which aims to acquire appropriate customers for a given 'pre-defined' offer. The dynamic 'just-in-time' decision making behind a real-time offer aims to exploit a given customer interaction defined by web-site clicks or phone usage.
For more: http://www-01.ibm.com/software/data/infosphere/stream-computing/smarter-insurance.html
Changes facing insurance providers such as deregulation, increased competition, advances in technology and globalization combine to exert substantial pressure on insurers, brokers, asset managers and reinsurers, and on their ability to respond to these changes. InfoSphere Streams turns these pressures into opportunity and enables predictive analytics of data in motion for real-time decisions allowing insurers to capture and analyze data - all the time, just in time.
Four ways InfoSphere Streams helps insurers become more competitive:
Real-time telematic analysis: Create real-time dashboards of behaviors such as car speed and locations to automatically adjust risk scores.
Seedy fraud detection: Receive incident reports as they happen and immediately feed into claims processes.
Cargo protection: Predict accidents or disasters in real time, dynamically update risk models and ensure informed underwriting.
Call center optimization: Improve client experience, quality and performance. Automate next best actions and increase automated responses.
International port in Pacific Ocean able to indentify illegal cargo in real-time
One insurer delivers customized services based on simple "utterances" from clients, rather than full sentences or specific commands
Insurer receives insight in milliseconds about changes in weather
This chart provides a very detailed architecture for big data telematics. InfoSphere Streams fits into this picture by analyzing high volume, high velocity data; it acts as a pre-processing filter to various landing zones.
For more: http://www-01.ibm.com/software/data/infosphere/stream-computing/smarter-utilities.html
Traditional business models for the utilities industry are losing relevance. The energy production and delivery industry is placing many more smart sensors, and meters, along production, transmission and distribution systems to get granular real-time data about the current state of faults and load. Powerful analytics on this data, when combined with other sources such as Outage and Distribution Management Systems (OMS/DMS), weather data, 3rd-party event monitoring systems, and Meter Data Management Systems (MDMS) can help utilities take necessary actions to avoid electric grid failures, to improve security and to optimize capacity and redundancy. InfoSphere Streams enables this predictive analytics allowing energy and utility providers to capture and analyze data - all the time, just in time.
Four ways InfoSphere Streams supports smart grid:
Outage detection and prediction: Monitor grid/plant elements and networks and rapidly predict and analyze data to detect grid/plant outages.
Load shedding: Monitor and run powerful real-time analytics on data from smart meters and sensors.
Condition based maintenance: Operationalize Condition Based Maintenance (CBM) and identify assets that are likely to fail in the near term or require maintenance or operational changes. Take action preemptively to control or repair equipment.
Smarter Analytics: Run extremely powerful analytics that take both structured real-time data from smart meters and well as unstructured data like satellite imagery feeds, weather forecasts and PMUs for a variety of uses such as price fluctuation forecasting, energy trading insights and more.
Pacific Northwest smart grid project services 60,000 homes across 5 states. It enables towns to avoid power outages using a two-way advanced meter system. Also empowers consumers to make educated choices about how and when to use electricity. The solution provides increased grid efficiency and reliability through system self-monitoring and feedback.
Battelle reduces energy costs and enhances power grid reliability and performance http://www.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_IM_ZN_USEN&htmlfid=IMC14785USEN&attachment=IMC14785USEN.PDF
CenterPoint Energy powers 2.3M Smart Meters with IBM InfoSphere Streams https://www.youtube.com/watch?v=Oz77KOAfRZY
This chart provides a very detailed architecture for the smart grid. InfoSphere Streams fits into this picture by analyzing high volume, high velocity data; it acts as a pre-processing filter to various landing zones.
Context-aware stream computing is a different paradigm – the left shows the traditional way data is accessed using queries to pull the data from a data storage device such as a data warehouse or database – which is still valid for many requirements.
The new context-aware stream computing paradigm brings data to the query – data is pushed or flows through the analytics.
Common drivers for those new use cases include:
When you need an immediate response/action and persisting and analyzing stored data isn’t fast enough.
When it is too expensive to store the data to be analyzed – e.g. most of it is throw-away and its more efficient to analyze/filter as you receive it and store the filtered results.
There are many resources for additional reading. Explore both business and technical resources. All resources publically accessible.