Video surveillance system and its application and advantages on Blackfin

1. Status of video surveillance system

The video surveillance system began with the initial analog closed-circuit television surveillance, experienced digitalization, and networked development, and is moving towards a distributed and intelligent direction. The development of video compression technology has promoted the digitization of video surveillance systems, saving a lot of storage space. The popularity of computer networks and the increase in bandwidth have made metropolitan area network video surveillance a reality. After more than 40 years of unremitting efforts by researchers, computer vision has entered a breakthrough development stage. Thanks to the research results of computer vision, the intelligent video surveillance system has begun to be industrialized.

Since the mid-1990s, represented by Carnegie Mellon University (CMU) and Massachusetts Institute of Technology (MIT), a number of U.S. colleges and universities have participated in the major visual surveillance project VSAM established by the US Defense Advanced Research Projects Agency. (Visual Surveillance and Monitoring), as well as the research results of other scientific research institutions, have made rapid development of intelligent visual analysis. After the "911" incident in the United States in 2001, and subsequent terrorist attacks such as the serial explosion of the Madrid train in Spain and the London Metro bombing, the worldwide demand for video surveillance systems, including intelligent video analysis systems, was unprecedented. More than 4.2 million cameras have been installed nationwide in the UK, with an average of 14 people each. A person may appear in front of as many as 300 cameras a day (The Daily Mail, UK). Domestically, at the end of 2007, 250,000 security cameras were installed in Guangzhou. On the basis of 263,000 cameras, Beijing has also installed all key units, public places where people gather, important transportation hubs, important urban infrastructure, and laws and regulations. Public image information systems are installed in key areas and all are connected to the police monitoring network. By 2010, Shanghai will install more than 200,000 surveillance cameras on the roads to establish a "social prevention and control system." Massive surveillance images require video surveillance systems to intelligently select compression, storage, and retrieval content.

At present, in addition to CMU and MIT, the embedded smart camera research group of Austrian Graz University of Technology, IBM's S3 (Smart Surveillance System) project team, Intel's IRISNET (Internet-scale, Resource-intensive Sensor Network Services) project team, etc., respectively It is in a leading position in different fields of distributed intelligent monitoring system. Object Video, Hisign, 3VR and other companies took the lead in realizing the industrial application of intelligent video surveillance. In China, the Institute of Automation of the Chinese Academy of Sciences, the Department of Electronic Engineering and the Department of Automation of Tsinghua University are at the forefront of research.

2. Brief Introduction of Technical Background of Intelligent Video Surveillance System

One of the core contents of intelligent video surveillance is the automatic tracking of specific targets. Target tracking can be divided into 5 steps, including motion detection, target classification, target (type) tracking, behavior analysis, and target (individual) tracking. For example, the tracking of the human body: first detect the moving object from the real-time image sequence (ie, video), then determine the human body in the moving object, and then track the movement of the human body, and analyze and select the person with abnormal behavior, such as at the station , People left behind by the airport and other packages, and finally continue to track people with abnormal behavior.

Motion detection is to extract the changed area from the background image from the image sequence. The effective segmentation of the motion area will greatly reduce the amount of calculation in the subsequent process. However, the instability of the background image, such as shadows, lighting, slow movement (such as snail crawling), static movement (leaving of leaves), etc., also make motion detection very difficult.

There are two different ways to implement motion detection in video surveillance systems. One is to directly use the intermediate results of the video compression algorithm. For example, ADI ’s third-party partners use the motion vectors in the MPEG 4 and H.264 encoding process. Motion detection and video compression are synchronously implemented on the processor. The other method is independent of video encoding.

Motion detection algorithms can be divided into multiple types according to different classification standards. The Institute of Automation of the Chinese Academy of Sciences summarizes motion detection algorithms into three types: background elimination, time difference, and optical flow. Both the background elimination method and the time difference method can be regarded as the difference image method. The background elimination method is the most commonly used method in motion segmentation at present. It uses the difference between the current image and the background image to detect the motion area. The time difference method is to use pixel-based time difference and thresholding between two or three adjacent frames in a continuous image sequence to extract the motion area in the image. The motion detection based on the optical flow method uses the optical flow characteristics of the moving target with time. The contour-based tracking algorithm is initialized by calculating the displacement vector optical flow field, so as to effectively extract and track the moving target. The advantage of this method is that it can detect independent moving targets even when the camera is moving.

The purpose of target classification is to extract the motion area of ​​a specific type of object from the detected motion area. According to different information used, target classification can be divided into two methods based on motion characteristics and shape information. Recognition based on motion characteristics uses the periodicity of target motion for recognition, and is less affected by color and light. Recognition based on shape information uses shape features of the detected motion area to match templates or statistics.

Target tracking is to create corresponding matches based on relevant features such as position, speed, shape, texture, color, etc. between consecutive image frames. According to different tracking methods, it can be divided into model-based tracking, area-based tracking, active contour-based tracking, feature-based tracking and so on.

Joint target tracking and classification (JTC) technology is an emerging research direction in the field of information fusion. The basic idea is that through the two-way information interaction between the target tracker and the target classifier, the tracking accuracy and classification performance of the target can be effectively improved at the same time.

Under certain circumstances, the tracking target needs to be refined from type to individual. This requires analysis and understanding of the target's behavior. The key issue of behavior understanding is how to obtain the reference behavior sequence from the learning sample, and the learning and matching behavior sequence must be able to handle slight feature changes on the spatial and temporal scales in similar motion pattern categories.

3. Difficulties in implementing intelligent video surveillance system and the advantages of Blackfin

Despite the great progress that has been made, there is still no recognized best method in the field of intelligent video analysis. The complexity of its own research content makes the research methods and tools diverse, the algorithm complexity is high, the scope of application is limited, and there is no universal method that meets the needs of robustness, accuracy, and speed. At the same time, the requirements for networked and distributed processing of video surveillance systems, as well as the limitations of cost, volume, and power consumption of large-scale engineering installations, have made embedded processors with ever-increasing computing power and bandwidth become mainstream in video surveillance systems select. Non-standardized intelligent video analysis is where DSP comes in.

The Blackfin processor is a convergent processor jointly developed by ADI and INTEL. Its MSA (Micro Signal Architecture) architecture has both MCU control capabilities and DSP high-speed computing capabilities. MCU and DSP are integrated into the same core, and only need the same set of development tools and the same set of instructions. Compared with the DSP and ARM chip architecture, it has the advantage of simple hardware and software implementation. Blackfin supports more than ten embedded operating systems such as ThreadX, Nucleus, uCOS-II, and uCLinux, providing customers with a familiar software architecture foundation. Blackfin is specially optimized for high-intensity, high-data-rate digital and media processing, is an ideal video processor, and has an extremely high cost performance. Its low power consumption is very suitable for IP camera products with small housing size.

Blackfin's dozens of DMA channels and flexible Cache can meet the requirements of the video surveillance system for large calculations and high data throughput. The ten-stage pipeline makes Blackfin have a strong ability to execute instructions in parallel. Zero-overhead loop control instructions allow a large number of loop jumps in the system to no longer consume any processor clock cycles. Taking advantage of these advantages, the idct4 × 4 algorithm of the real decoder is 7 times faster on Blackfin.

Frequency data has its own characteristics. In different color spaces, each component representing a pixel is usually 8 bits wide. Blackfin's 4 video arithmetic operation units and video pixel instruction sets greatly accelerate the video operation speed. A video pixel operation instruction can complete 11 types of video pixel operations such as addition, subtraction, addition and subtraction of four pairs of video data components in one cycle, taking an average, or subtracting and calculating the absolute value. These operations are widely used in various algorithms of motion estimation, loop filter and intelligent video analysis in codec algorithms. In some basic operators of intelligent video analysis, such as histogram statistics, median operation, Sobel operation, expansion operation in morphology, etc., can use Blackfin's MIN, MAX instructions to eliminate conditional jumps and save processor cycles. Not only that, Blackfin also supports 13 kinds of vector operations for non-video data. Appropriate design of the data structure, the separation of the previous background, threshold calculation and update can use Blackfin's unique instructions to make the intelligent video analysis algorithm faster. Of these very effective instructions, most of them can be executed in parallel, doubling the processing power of Blackfin.

4. Examples of intelligent video surveillance systems

The Department of Automatic Control of Tsinghua University has long-term research and accumulation in the field of visual analysis. Combined with the advantages of ADI, the two parties have implemented an intelligent video surveillance system on the Blackfin BF561 dual-core processor. ADI provides high-quality and high-performance H.264 encoding algorithm, and the Tsinghua University Department of Automatic Control implements the automatic tracking algorithm on BF561. The system block diagram is shown in Figure 1.

Figure 1: Block diagram of an intelligent monitoring terminal based on BF561
Figure 1: Block diagram of an intelligent monitoring terminal based on BF561

The H.264 encoding algorithm module is one of the free software modules provided by ADI to Blackfin customers. There are already two chips based on BF53x and BF561. It supports fully dynamic parameter configuration, users can change the code rate, frame rate, key frame interval, quantization value, etc. according to the scene and network bandwidth changes while the system is running. From the 80KBb CDMA network to the 3Mb DVR system, the same set of function libraries can be used to achieve the ideal coding quality. Has a strong adaptability and flexibility.

The intelligent tracking algorithm of the Department of Automatic Control of Tsinghua University uses the background subtraction method of single Gaussian background modeling for motion detection. In the target classification stage, it combines two methods based on the classification of motion characteristics and the classification based on shape information, using the human body and the vehicle. The ratio of length to width, gradient histogram and periodicity of motion classify moving objects. When tracking similar targets, an area-based algorithm is used to determine the direction and distance of the displacement of the center of mass of the moving object between successive frames. Based on the above three-stage algorithm, the system can also realize functions such as crowd tracking, intrusion detection, statistics of the number of people and vehicles, detection of leftover objects, illegal occlusion of the camera and displacement alarm.

In the system, Core A of BF561 is used to realize H.264 encoding algorithm, and Core B is used for intelligent video analysis. Core A also runs the uCos II operating system and RTP and TCP / IP protocol stacks. YUV4: 2: 2 video frames are transferred to the SDRAM buffer via PPI (Parallel Peripheral Interface) in DMA mode. Core A and Core B share the frame buffer. Core B first starts memory DMA to transfer the Y (brightness) component of the video frame to the line buffer of Core B's on-chip storage area L1 SRAM. Core B performs background modeling and subsequent motion detection and target tracking on the Y component in the line buffer. If an object of the specified type appears in the visible area, Core B sends an interrupt signal to Core A. Core A can send alarm information to the local console through the UART interface, or to the remote console through the network interface; The frame buffer can be modified to add a rectangular border to the target to identify the target. Core A also receives video brightness and chrominance data from the frame buffer through a memory DMA. The encoder encodes the frame buffer processed by Core B. At the same time, the system can also output the frame buffer modified by Core B through another PPI interface to display the tracking results in real time. The target tracking algorithm is real-time, so there is no delay in encoding. When no moving object is detected, the encoder can work in a low bit rate or low frame rate state, even without encoding. Once a specified type of moving object is detected, the encoder resumes normal operation and uploads the compressed code stream and corresponding time to the management system through the Ethernet interface. This not only saves storage space, but also facilitates the retrieval of video recordings afterwards.

The system can also set the monitoring area boundary through the UART or Ethernet interface to identify the monitoring range of intrusion detection. When a moving object crosses the boundary, the system immediately alerts the console. The console can also send instructions to the intelligent monitoring terminal to change the functions it performs, from intelligent tracking, to intrusion detection, to the detection of leftover objects or counting the number of people, etc., to switch freely. Without the powerful processing power and flexibility of Blackfin, it is almost unimaginable to implement such complex and numerous functions in an embedded processor.

5. The development trend of intelligent video surveillance

Although intelligent video analysis has been used in video surveillance, it still has a long history of development. An ideal intelligent video surveillance system should be like this:

If one day, a shooting incident suddenly occurred in a corner of the city. The suspect immediately fled to the car not far away, trying to drive away. However, his every move has fallen into the public safety monitoring network, and it is difficult to escape. First, the video surveillance system with voice recognition and sound source localization function immediately adjusts the camera angle and direction after detecting the gunshot, and aligns the gunshot with the direction of the gunshot. At the same time, it activates the first alarm and reports the approximate location of the shooting. The camera collects video, detects moving human bodies, analyzes the behavioral characteristics of characters, and immediately locates and tracks suspects. After positioning, multiple cameras in the appropriate position in the system are notified to extract the facial features of the suspect, the license plate of the vehicle trying to drive, etc., and upload it to the management system, establish a database entry and distribute it to the public security bureau, station, airport, bank, customs And other key units. The monitoring system turns to track the vehicle. The police set up police forces on the way of the suspect movement to intercept and hunt down. Even if the suspect luckily escaped the scene of the hunt, when he appeared in front of any camera in the country, he still could not escape the fate of the arrest.

This system incorporates a variety of advanced monitoring technologies. The combination of audio and video, the combination of visual imaging and non-visual imaging, the combination of target tracking and behavior analysis, and feature recognition will be the trend of future security systems. Each of these technologies has been considerably developed separately. Blackfin has many applications in infrared cameras and phased array microphones. The more accurate, faster and more robust intelligent visual analysis algorithm is still a difficult point. ADI will continue to cooperate with global scientific research institutions and enterprises in the field of intelligent video surveillance to create a safer and better life for us.

Description of Antenk's Slim D-Sub Connector

D-Sub Connector Slim is a space-saving D-sub Socket Connector with an 8.54mm depth. The slim 8.54mm depth reduces the board mounting area by 33% as compared with standard models. The Slim Profile D-Sub Connectoroffering includes D-sub sockets with nine right-angle DIP terminals and a mounting board thickness of either 1.6 mm or 1.0 mm (from difference in lock pin structure). RoHS compliant, Slim D-Sub Connector offers a 3A rated current, 300VAC rated voltage, and an insertion durability of 100 times within an operating temperature range of -25º to 105ºC.


Features of Antenk's Slim D-Sub Connector

Board mounting area is reduced by 33% (compared with standard  models) using a depth of 8.4mm

D-sub sockets with nine right-angle DIP terminals

Mounting board thickness of either 1.6mm or 1.0mm (from difference in lock pin structure)

RoHS Compliant


Applications of Antenk's Slim D-Sub Connector

Factory Automation

Machine Tools

Power SuppliesMedical Tools

Test & Measurement

LSI/FPD Manufacturing Systems

Information Transmission Tools

Security Tools

Industrial Tools

Slim Profile D-Sub Connector

Slim D SUB Connector,High Density D-Sub Connector Slim, Solder type D-Sub Connector Slim,Right Angle D SUB Connector Slim, slim profile 90°- SMT D SUB Connector,9 15 25pin slim profile 90°- THT D SUB Connector

ShenZhen Antenk Electronics Co,Ltd , https://www.antenk.com