Parallel driving in CPSS: a unified approach for transport automation and vehicle intelligence

The emerging development of connected and automated vehicles imposes a significant challenge on current vehicle control and transportation systems. This paper proposes a novel unified approach, Parallel Driving, a cloud-based cyberphysical-social systems U+0028 CPSS U+0029 framework aiming at synergizing connected automated driving. This study first introduces the CPSS and ACP-based intelligent machine systems. Then the parallel driving is proposed in the cyber-physical-social space, considering interactions among vehicles, human drivers, and information. Within the framework, parallel testing, parallel learning and parallel reinforcement learning are developed and concisely reviewed. Development on intelligent horizon U+0028 iHorizon U+0028 and its applications are also presented towards parallel horizon. The proposed parallel driving offers an ample solution for achieving a smooth, safe and efficient cooperation among connected automated vehicles with different levels of automation in future road transportation systems.

the framework of parallel driving has been steadily conceived, as seen in [8]− [11].This has also been significantly motivated by the emerging development in connected and automated vehicles [12]− [14].
Different levels of vehicle automation have been recently defined and recommended by SAE (2014), Germany Federal Highway Research Institute (BASt, 2012) and US National Highway Traffic Safety Administration (US NHTSA, 2013) [15]− [17], where a distinct jump in automation levels occurs between Level 2 and Level 3 automation in the SAE definition.For Level 2 automation, the driver is required to continuously monitor the driving situation, while for Level 3 the automated driving system will monitor the driving situations so the driver is allowed to be fully disengaged from the driving task.However, if requested, the driver must be ready to take over within a certain period of time.
Current automotive technology advances primarily at Level 1 and partially at Level 2, with several commercial product available, such as adaptive cruise control (ACC) for Level 1 automation, and BMW's Traffic Jam Assistant, GM's Super Cruise, Mercedes' Distronic Plus with Steering Assist, Toyota's Automated Highway Driving Assist, Volvo's ACC with Steer Assistance and Tesla Model S for Level 2 Automation [12], [18].
Thanks mainly to the DARPA Challenge, there have been substantial technological developments at Levels 4 and 5 (or full automation) [19]− [22] in the past decade, as also reflected in the Google self-driving cars.One of the on-going challenges for fully autonomous driving is the reliable and robust operation in more complex real-world driving environments, such as those found in urban driving [e.g., 13,23].In parallel, vehicle platoons or cooperative vehicle automation have also been investigated for a few decades, further enhancing vehicle safety, energy efficiency as well as highway capacity (e.g. the PATH project [24], along with more recent efforts [14], [25]).
European roadmap [26] has suggested three main automated driving milestones up to 2030, such that: Level 3 automation to be available at low speed and less complex driving scenarios by 2020, and full autonomy to be available on highways by 2025 and in urban areas by 2030 that is significantly enabled by advances in connected vehicle technologies [12], [14].
It can therefore be well foreseen that in the coming two to four decades (e.g. up to 2050), our road transportation system would be consisting of a mix of connected vehicles with different levels of automation, which necessitates a unified approach for future smart and safe driving.This considerably motivates the development of CPSS-based parallel driving.Section II outlines CPSS and ACP-based intelligent machine framework.Section III proposes the parallel driving in CPSS.Section IV presents parallel testing, parallel learning as well as parallel reinforcement learning.Development of intelligent horizon and its applications are presented towards parallel horizon concept in Section V. Concluding remarks are then given in Section VI.

II. CYBER-PHYSICAL-SOCIAL SPACE AND ACP-BASED
INTELLIGENT MACHINE FRAMEWORK Cyber-physical systems (CPS) has been gaining increasing concerns in the past two decades, while CPSS augments the CPS capacity by integrating an additional dimension-human and social characteristics, so as to achieve more effective CPS design and operation [6].This augmentation also has a philosophical implication for being in line with Karl Popper's theory of reality, which states that three interacting worlds coexist in our universe: the physical world, the mental world, and the artificial world, as shown in Fig. 1.
These three worlds are coupled by physical space and cyber space respectively, so as to conceive the cyber-physical-social space (CPSS).Rapid development in ICT enables us to exploit more in the artificial world so as to design and optimize the systems in the physical and mental world.The ACP theory and approach (Fig. 2) has been developed by Fei-Yue Wang and his research group since 2004 [1]− [7], aiming at modelling, analysis and control of complex systems, such as systems in the CPSS: ACP = Artificial societies + Computational experiments + Parallel execution Fig. 3 depicts the framework of the ACP-driven intelligent machine.In the framework, the physically-defined machine (or called Newton machine) interacts with the softwaredefined machine (or called Merton machine) through three coupled modules, namely management and control, experiment and evaluation, and learning and training, within the cyber-physical-social spaces.This parallel execution between the physically-and software-defined machines is expected to enable an optimal operation of the machines [27], [28].

III. PARALLEL DRIVING IN CPSS
For future connected automated driving, three main elements exists: physical vehicle (with physical attributes),human driver (with both physical (e.g.neuromuscular dynamics) and cognitive (e.g.attention, intention) attributes), as well as control and information related to driving (artificial).According to the ACP approach, these three road driving elements can be naturally projected into the three parallel worlds, namely physical world, mental world, and artificial world, as seen in Fig. 4, which conceives the CPSS-based parallel driving framework [29]− [31].
Fig. 4 presents the three levels of worlds co-existing in parallel: physical (Level I), mental (Level II), and artificial (Level III).The artificial world consists of two layers.Level IIIb refers to CPSS services consisting of three components: people (social web), place/location (geo web) and technology (sensors, Internet of Things, etc), similar to that proposed in [32].Addition to this, in parallel driving, the artificial world is enhanced to also have a dedicated driving layer, namely Level IIIa for artificial drivers and artificial vehicles (ADAV), which is designed to realize the "computational experiments" and "parallel execution"elements within the ACP approach [7].
Furthermore, for each of the vehicles, an ADAV control module will be assigned which communicates with the artificial world and other ADAV modules, interacts with the human driver (monitoring driver state/behavior/intention, e.g., [33]− [36], joint cognition [37], driver-vehicle shared control [38]− [40], etc) in the mental world, while operating the physical vehicle in the physical world.
Human drivers are active in both mental and physical worlds, but if the fully autonomous driving function is activated, a driver's driving-related physical behaviors are not needed.However, if the driver intends to take over the control, the vehicle will switch to a lower level of automation, where driver's physical behaviors have to reengage within the driving tasks.Thus different HD-RD-ADAV units could suggest very different automation-driving patterns (e.g. from Level 0 to Level 6) with possible frequent shifting among different automation levels during real-world driving [41], [42].
The fundamental principle of parallel driving is that the IIIa ADAV layer together with the allocated individual ADAVs deal with the complex automated driving while keeping the real vehicles themselves as simple as possible.Fig. 5 outlines the ACP-driven synergy between physical vehicles and virtual Fig. 5. ACP-based synergy between the physical built and software-defined virtual vehicles through the interactive prescriptive, predictive and descriptive intelligence within CPSS.
vehicles through the interactive prescriptive, predictive and descriptive intelligence.The next session focuses on discussions on the subsystems development: parallel testing, parallel learning and parallel reinforcement learning.

IV. PARALLEL TESTING, PARALLEL LEARNING AND PARALLEL REINFORCEMENT LEARNING
A major problem that hinders automated vehicle design is the lack of testing data.Due to the complexity of traffic scenarios, it is hardly to collect all the data to train or test the automate vehicles that we made.To address this challenge, we have proposed and developed the so called parallel learning theory in automated vehicle design [43], [44].
As shown in Fig. 6, the theoretical framework of parallel learning consists of two parts.Above the dotted line is the data preprocessing stage based on the artificial system defined by the software.The lower part of the dotted line shows the predictive learning and ensemble learning based on the computational experiment, and the parallel control and prescriptive learning.Fine arrows represent data generation or data learning, and bold arrows represent interactions between action and data.In the data processing stage, parallel learning method first selects the specific "small data" from the original data, feeds into the software-defined artificial system, and generates a large amount of new data from the artificial system.These artificial data, together with the specific raw data, form a set of "big data" that used for updating the machine learning model.An example is that we can use GAN models to build virtual videos to test automated vehicles.Fig. 7 demonstrates our development of a simulation test platform, which includes parallel traffic systems.The upper left figure shows the cyclic updating method of the co-evolution between the real testing ground and parallel virtual testing ground, while the upper right figure presents an illustration on mapping the real view data onto virtual space for facilitating testing.Notice that we need labeled data to train automated vehicles, we use predictive learning carried out on parallel machine to self-label the data we required.Predictive learning originated from the interpretation of cognitive psychology in children's learning styles.Its core is to model the real environment inside machines, simulate and predict the possible future, and observe how the world works by observing and demonstrating.The simulation is unsupervised or semi-supervised; while the initial states and the final results are supervised.A recently-developed example of predictive learning for parallel vehicle is "parking like human" [45].In this example, we aims to teach automated vehicles to learn the general parking skills of human.Usually, an autonomous vehicle first plans a trajectory that links the start and destination states.Then, it determines a sequence of steering actions to make sure that the vehicle moves along with this trajectory toward the destination state.The major problem of these trajectory planning methods is that we must fully consider vehicle dynamics heavily affecting the geometry property of the candidate trajectories.
Parallel learning provides an alternative way to solve this problem by directly bridges the actual parking trajectories and the steering actions to find the best parking trajectory.First, we sample a large number of vehicle parking trajectories that we can make during a certain time period and build a deep neural network to remember all these trajectories (suppose we had shifted the start points of all these trajectories into the same original state).From the viewpoint of parallel learning, this is indeed the self-labeling process.
Specially, the input of this neural network is the destination state and the output of this neural network is the corresponding control actions and also the corresponding trajectory (Fig. 8).If more than one set of control actions can lead to the same destination state, we only let the neural network to remember the ones with optimal (time, energy, etc.) cost.Each time when a target destination state is known, we let the trained neural network to directly recall the needed control actions [46].Furthermore, to handle the integrated data from the artificial system and computational experiment, we combine the parallel learning and deep reinforcement learning approaches to propose the parallel reinforcement learning (PRL) theory for automated vehicle design [47].The framework of the PRL is depicted in Fig. 9.
In the data processing stage of deep reinforcement learning, the big data is modeled as a Markov decision process (MDP) (S, A, f , R), where S={s(t)} and A={a(t)} are set of state variables and control actions, f is the state transition probability density function, R = {r(s, a)} is the reward function.The optimal value function is exhibited as the finite expected discounted sum of the rewards where the control policy π is the distribution over the control actions.T = [t 0 , t f ] is the entire time interval, µ ∈(0, 1) is a discount factor.To derive the optimal control action for current state variable, (1) is reformulated recursively as follow where p sa,s is denoted as the transition probability from state s to next state s taking action a.As the optimal value function is determined, the optimal control policy is computed as Furthermore, the action-value function Q(s, a) and its optimal value Q * (s, a) are defined as the following formula: In the deep Q-network method, the action-value function Q(s, a) is represented by the Q-network with weights was ( The mean squared error (MSE) is written as Similarly, the MSE in the double Q-learning (DQL) is rewritten as follow in which current Q-network is used to select actions and older Q-network is used to evaluate actions.
The deep deterministic policy gradient (DDPG) is the continuous analogue of deep Q-network.The MSE in DDPG is described as Finally, the special knowledge derived from the PRL can be applied to the feedback control for the parallel artificial system, and also can be utilized for the indicative control in real dynamic system.

V. PARALLEL HORIZON: DEVELOPMENT OF INTELLIGENT HORIZON AND IT APPLICATIONS
Based on the current infrastructure, and with the future perspective of the simultaneous concurrence of different automation levels in common environments, the intelligent horizon (iHorizon) has been developed towards a long-term realization of parallel horizon.
iHorizon novel framework takes a step forward from current eHorizon technology made available by Bosch [48], continental [49] and HERE [50] by integrating fewer information requirements further exploited using machine learning.Furthermore, it provides a dynamic prediction of the speed profile, which can be suitable for further applications related to safety and energy consumption, and includes the driver as an essential component to improve the prognosis results.
As shown in the Fig. 10, the baseline iHorizon consists of three main modules.First, the driving style recognition (DSR) algorithm is used to identify the driver and classify it within a continuous index into calm, normal and aggressive clusters [51], [52].This information is used in the second and third module to provide long-term and short-term predictions of the speed profile [53], [54].

A. Driving Style Recognition Algorithm
Driving style is essential to guarantee safe and efficient driving in level 0 automation vehicles, and important requirement to guarantee human acceptance of higher levels of automation [55], [56].Eco-driving index in nowadays vehicles and driving by demonstration are some of the examples that support driving style relevance for autonomous vehicle development [57]− [60].
DSR is designed using unsupervised machine learning as introduced in [51].Unsupervised methods allow to encourage the results generalization and support their objectivity.In the context of the iHorizon, driving style not only strongly contributes to safety improvement, but also highly affects acceleration profiles, and energy consumption [57], [58].
Fig. 11 illustrates DSR development from real-world testing data using an experimental car.The design process consist of a first stage for relevant signals selection using k-means algorithm as included in Table I and a second stage of final clusters selection and algorithm definition using Gaussian mixture model.Whilst k-means uses rigid margins to classify driving styles within calm, normal and aggressive, the Gaussians allow soft margins and provide better classification accuracy as well as consistency implementing the so-called Expectation-Maximization algorithm, as included in Table II.
Further details can be found in [49].The data used consist of driving style specific speed profiles from rural, urban and highway roads with feedback from the driver.Unsupervised learning allows dealing with the data with independence of the subjective labels, which are only used to test the consistency of the algorithm output and the driver perception.
DSR is able to return a continuous index from 0-very calm to 1-very aggressive and provide a three class driver style classification.Driver style is used as input for future speed prediction in the second and third modules.

TABLE I K-MEANS ALGORITHM
Step 1: Initialize µ k randomly Step 2: Minimize J with respect to r nk keeping µ k fixed.
The data points are classified with the current means.Calculate for all data points and means:

0, otherwise
Step 3: Minimize J with respect to µ k keeping r nk fixed.
Update mean values to locate them at the minimum distance to all elements in the cluster.Considering the total number of elements in a cluster its n means are updated following: Repeat Steps 2 and 3 until convergence.

TABLE II EXPECTATION-MAXIMIZATION ALGORITHM FOR GAUSSIAN MIXTURES
Step 1: Initialize parameters: µ k , π k and Σ k Step 2: E-Expectation Evaluate posterior probabilities with the current parameters: γ(z nk ) Step 3: M-Maximization Re-calculate the parameters using current posterior probabilities values: Iterate Steps 2 and 3 until convergence of either likelihood or parameters value.

B. Future Speed Prediction
Second and third module receive diving style information and predict the future speed in cycle-length and short-term horizons.These incorporate information about the road type readily available in a commonly used GPS device, and can handle minimal information about the traffic conditions [53], [54].In both cases, the speed and acceleration is obtained using a 2 dimensional Markov Chain which uses current speed and acceleration values to generate the next state.These consist of an array of random variables, z, that are conditionally independent and therefore allow for major model simplification as included in (9).The transitions are stored in a transition probability matrix (10) whose components are obtained from data with (11) [61]− [64].
T P M (z m , z m+1 ) ≡ P (z m+1 |z m ) (10)  where m refers to current state in time within a M total states, P alludes to probability function, T P M is the transition probability matrix and n ij the number of transitions observed from state z i to state z j .Cycle-length speed prediction provides a guideline speed and acceleration profile for an entire drive cycle as valuable information to anticipate energy consumption, whilst shortterm prediction is able to provide a more precise prognosis using updated speed and acceleration values, and a 5 s to 10 s window.Both models are elaborated in [53] and [54] for shortterm and cycle-length speed prediction respectively.Fig. 12 illustrates both prediction modules along with the information requirements.The results of both cycle-length speed guideline and short-term prediction are included in Fig. 13 (left) and (right) respectively.Fig. 13 (left) illustrates short-term prediction in 10 s results, and long-term guideline

C. iHorizon Application to Intelligent Energy Management
iHorizon framework allows for multiple application in terms for energy management and autonomous feature development.Probably one of the most evident is the use of the cycle-length speed prediction for plug-in hybrid electric vehicle (PHEV) control optimization.Both acceleration and speed profile can be used along with a vehicle model to anticipate driver power demand and cycle power requirements so as to guarantee full battery depletion by the end of the planned cycle.This is the case explained in [61] where the authors use multiple optimal control strategies, which produced using dynamic programming, to teach a layer recurrent neural network using back propagation, where the weights update is modified using extended Kalman filter (EKF).In this approach to neural network training the weights update is described as a finding the minimum square error of the weights estimation as in (12) using the EKF algorithm included in (13)− (15).where ω, ωk , P k , R k , H k , and Q k are respectively: weights value, weights estimation using EKF, error covariance matrix, measurement noise covariance matrix, matrix derivative with respect to the weights of the network outputs and process noise covariance matrix [65]− [67].
The intelligent energy management resulting from the neural network training makes uses of the iHorizon as illustrated in Fig. 14 and is able to emulate with an average 3% state of charge error optimal results with potential real time capability.The results in term of fuel consumption where also very optimistic in simulation environment event for reasonable long drive cycles with a maximum absolute error below 0.06 kg in all cases (Fig. 15).

VI. CONCLUDING REMARKS
Based on the ACP approach, this paper proposes a novel cloud-based CPSS parallel driving framework for synergizing future connected automated driving involving complex multilevel vehicle automation as well as driver-automation interactions.Within the parallel driving framework, parallel testing, parallel learning, parallel reinforcement learning, as well as development of intelligent horizon (iHorizon) towards parallel horizon are presented and discussed with examples.
The fundamental principle of parallel driving is that the designed computational artificial ADAV layer together with the allocated individual ADAV control modules deal with the complex automated driving while keeping the real vehicles as simple as possible.The proposed parallel driving is able to offer an ample solution for achieving a smooth, safe and efficient collaboration among connected automated vehicles with different levels of automation in future transportation.

Fig. 9 .
Fig. 9.The theoretical framework of the parallel reinforcement learning.

Fig. 11 .
Fig. 11.Driving style recognition algorithm development using unsupervised machine learning including subjective data collection, signals selection with k-means and clusters definition with Gaussian mixture models.

Fig. 12 .
Fig. 12. Information requirement for short-term and cycle-length speed prediction.Short-term module uses a 0.2 s Markov Chain, driving style and road type, whilst cycle-length module incorporates the complete route, traffic information and the speed limit.

Fig. 13 .
Fig. 13.(left) shot-term speed prediction in highway road with a calm driver; (right) cycle-length guideline speed profile for a complete cycle combining city and highway driving with variable driving style.

Fig. 14 .
Fig. 14.Intelligent energy management for PHEV based on iHorizon application and neural network control.profile is included in Fig.13(right).

Fig. 15 .
Fig. 15.(left) SoC error with respect to the global optimal reference (blue) using 10 and 20 neurons in the hidden layer; (right) idem for fuel consumption error.