SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Strategies & Market Trends : 2026 TeoTwawKi ... 2032 Darkest Interregnum -- Ignore unavailable to you. Want to Upgrade?


To: John Pitera who wrote (134581)7/14/2017 9:04:47 AM
From: TobagoJack2 Recommendations

Recommended By
dvdw©
John Pitera

  Read Replies (1) | Respond to of 220564
 
re nvda, was guided to it perhaps 18 months ago, but i unfortunately paid no attention as was and am bandwidth challenged. so i also wait.

going forward, it may be that nvda and such same face a different sort of competition than what they had heretofore been used to, one w/ more bandwidth, capability and capacity, to innovate as well as to sustain losses. open question as to weather such competitors go commercial. wait & see.

nextplatform.com

China Tunes Neural Networks for Custom Supercomputer Chip
Nicole Hemsoth
July 11, 2017

Supercomputing centers around the world are preparing their next generation architectural approaches for the insertion of AI into scientific workflows. For some, this means retooling around an existing architecture to make capability of double-duty for both HPC and AI.

Teams in China working on the top performing supercomputer in the world, the Sunway TaihuLight machine with its custom processor, have shown that their optimizations for theSW26010 architecture on deep learning models have yielded a 1.91-9.75X speedup over a GPU accelerated model using the Nvidia Tesla K40m in a test convolutional neural network run with over 100 parameter configurations.

Efforts on this system show that high performance deep learning is possible at scale on a CPU-only architecture. The Sunway TaihuLight machine is based on the 260-core Sunway SW26010, which we detailed here from both a chip and systems perspective. The convolutional neural network work was bundled together as swDNN, a library for accelerating deep learning on the TaihuLight supercomputer

According to Dr. Haohuan Fu, one of the leads behind the swDNN framework for the Sunway architecture (and associate director at the National Supercomputing Center in Wuxi, where TaihuLight is located), the processor has a number of unique features that couple potentially help the training process of deep neural networks. These include “the on-chip fusion of both management cores and computing core clusters, the support of a user-controlled fast buffer for the 256 computing cores, hardware-supported scheme for register communication across different cores, as well as the unified memory space shared by the four core groups, each with 65 cores.”



Despite some of the features that make the SW26010 a good fit for neural networks, there were some limitations teams had to work around, the most prominent of which was memory bandwidth limitations—something that is a problem on all processors and accelerators tackling neural network training in particular. “The DDR3 memory interface provides a peak bandwidth of 36GB/s for each compute group (64 of the compute elements) for a total bandwidth of 144 GB/s per processor. The Nvidia K80 GPU, with a similar double-precision performance of 2.91 teraflops, provides aggregate memory bandwidth of 480 GB/s…Therefore, while CNNs are considered a compute-intensive kernel care had to be taken with the memory access scheme to alleviate the memory bandwidth constraints.” Further, since the processor does not have a shared buffer for frequent data communications as are needed in CNNs, the team had to rely on a fine-grained data sharing scheme based on row and column communication buses in the CPU mesh.

“The optimized swDNN framework, at current stage, can provide a double-precision performance of over 1.6 teraflops for the convolution kernels, achieving over 50% of the theoretical peak. The significant performance improvements achieved from a careful utilization of the SW26010s architectural features and a systematic optimization process demonstrate that these unique features and corresponding optimization schemes are potential candidates to be included in future DNN architectures as well as DNN-specific compilation tools.”

According to Fu, “By performing a systematic optimization that explores major factors of deep learning, including the organization of convolution loops, blocking techniques, register data communication schemes, as well as reordering strategies for the two pipelines of instructions, the SW26010 processor on the Sunway TaihuLight supercomputer has managed to achieve a double-precision performance of over 1.6 teraflops for the convolution kernel, achieving 54% of the theoretical peak.”






To further get around the memory bandwidth limitations, the team created a three-pronged approach to memory for its manycore architecture. Depending on what is required, the CPE (compute elements) mesh can access the data items either directly from global memory or from the three-level memory hierarchy (register, local data memory and larger, slower memory).

Part of the long-term plan for the Sunway TaihuLight supercomputer is to continue work on scaling traditional HPC applications to exascale, but also to continue neural network efforts in a companion direction. Fu says that TaihuLight teams are continuing the development of swDNN and are also collaborating with face++ for facial recognition applications on the supercomputer in addition to work with Sogou for voice and speech recognition. Most interesting (and vague) was the passing mention of a potential custom chip for deep learning, although he was non-committal.

The team has created a customized register communication scheme that targets maximizing data reuse in the convolution kernels, which reduces the memory bandwidth requirements by almost an order of magnitude, they report in the full paper (IEEE subscription required). “A careful design of the most suitable pipelining of instructions was also built that reduces the idling time of the computation units by maximizing the overlap of memory operation and computation instructions, thus maximizing the overall training performance on the SW26010.”






Double precision performance results for different convolution kernels compared with the Nvidia Tesla K40 using the cuDNNv5 libraries.

To be fair, the Tesla K40 is not much of a comparison point to newer architectures, including Nvidia’s Pascal GPUs. Nonetheless, the Sunway architecture could show comparable performance with GPUs for convolutional neural networks—paving the way for more discussion about the centrality of GPUs in current deep learning systems if CPUs can be rerouted to do similar work for a lower price point.

The emphasis on double-precision floating point is also of interest since the trend in training and certainly inference is to push lower while balancing accuracy requirements. Also left unanswered is how convolutional neural network training might scale across the many nodes available—in short, is the test size indicative of the scalability limits before the communication bottleneck becomes too severe to make this efficient. However, armed with these software libraries and the need to keep pushing deep learning into the HPC stack, it is not absurd to think Sunway might build their own custom deep learning chip, especially if the need arises elsewhere in China—which we suspect it will.

More on the deep learning library for the Sunway machine can be found at GitHub.

Categories: AI, HPC, ISC17

Tags: ISC17, TaihuLight




To: John Pitera who wrote (134581)7/14/2017 9:09:23 AM
From: TobagoJack  Respond to of 220564
 
between the possibilities of quantum everything and different architectures of computing being investigated, would say tech is at the cusp of phase-change, and next 2018 - 2032 should be interesting

time.com

Scientists Just Teleported an Object Into Space for the First TimeScientists have successfully teleported an object from Earth to space for the first time, paving the way for more ambitious and futuristic breakthroughs.

A team of researchers in China sent a photon from the ground to an orbiting satellite more than 300 miles above through a process known as quantum entanglement, according to MIT Technology Review. It’s the farthest distance tested so far in teleportation experiments, the researchers said. Their work was published online on the open access site arXiv.

For about a month, the scientists beamed up millions of photons from their ground station in Tibet to the low-orbiting satellite. They were successful in more than 900 cases.

“This work establishes the first ground-to-satellite up-link for faithful and ultra-long-distance quantum teleportation, an essential step toward global-scale quantum Internet,” the team said in a statement, according to MIT Technology Review.

The MIT-owned magazine described quantum entanglement as a "strange phenomenon" that occurs "when two quantum objects, such as photons, form at the same instant and point in space and so share the same existence." "In technical terms, they are described by the same wave function," it said.

The latest development comes almost a year after physicists successfully conducted the world's first quantum teleportation outside of a laboratory. Scientists at that time determined quantum teleportation, which is often depicted as a futuristic tool in science-fiction films, is in fact possible.




To: John Pitera who wrote (134581)7/15/2017 7:04:40 AM
From: dvdw©  Respond to of 220564
 
Hi John, thanks for your inputs.

You say " Liquidity is leaving the aggregate system..... seasoned observers believe we have macro distribution of stocks going on."

Not so sure this is measurable, during this cycle of derivative extrapolation. Position states change on the basis of legal supply and demand. Legal supply is governed by rules, those rules have been so thoroughly obfuscated by the derivative world, its very difficult to know the status of the underlying legal issuance.

The article next illustrates how difficult it is to create tests of measurability about the status of particulars, without some sort of containment medium.

nextbigfuture.com

excerpt Researchers used laser light to link caesium atoms and a vibrating membrane. The research, the first of its kind, points to sensors capable of measuring movement with unseen precision beyond Heisenberg limits.
A number of experiments – demonstrate that Heisenberg’s Uncertainty Principle to some degree can be neutralized. This has never been shown before, and the results may spark development of new measuring equipment, new and better sensors.