Intel Collective

A space for developers to collaborate on Intel software tools, libraries, and resources. Share knowledge and connect with Intel product experts. Find the information you need to drive innovation and simplify development from edge to cloud with Intel.
A space for developers to collaborate on Intel software tools, libraries, and resources. Share knowledge and connect with Intel product experts. Find the information you need to drive innovation and simplify development from edge to cloud with Intel.

Pinned Articles

View all 14 articles

Intel employees have deemed these posts noteworthy.

Pinned
3votes
115views
19 minute read
How-to guide

Migrating the Jacobi Iterative Method from CUDA* to SYCL*

Overview This document demonstrates how a linear algebra Jacobi iterative method written in CUDA* can be migrated to the SYCL* heterogenous programing language. Jacobi iterative method The Jacobi ...
Pinned
2votes
169views
1 minute read
How-to guide

How to bring your own dataset and retrain a TensorFlow model with OpenVINO™ toolkit

The notebooks repo on GitHub is a collection of ready-to-run Jupyter notebooks, for learning and experimenting with the OpenVINO™ toolkit. One of the examples is a notebook titled 301-tensorflow-...

Questions

Browse questions with relevant Intel tags

27,276 questions

2votes
3answers
60views

Zero optimiziation doesn't detect heap overflow

I know it must've been answered somewhere, but I didn't find any information regarding this strange behavior. I was just messing around with the heap, and when I executed the program with zero ...
-1votes
0answers
16views

How is the short jump instruction encoded in x86?

I've been messing around with some low-level stuff, like small compilers and simple bootloader programs. I'm trying to figure out how the operands of jump instructions are encoded in binary, ...
0votes
0answers
21views

Intel IO APIC "Established APIC Programming Model"

I was reading the Intel Atom® processor Z8000 series: Datasheet, vol. 1. Can be found here https://cdrdv2.intel.com/v1/dl/getContent/332065. For the IO APIC section, it states the IO APIC features an &...
2votes
0answers
24views

Memory throughput for strided memory accesses

I am measuring memory throughput and runtimes using _mm256_i32gather_epi32 intrinsic. Here is the loop I use for testing: for (int i = 0; i < len; i+=8) { const __m256i* indexes_2 = ...
  • 1,894
1vote
0answers
33views

reversing an array and printing it in x8664 (nasm)

I am trying to print an array, reverse it, and then print it again. I manage to print it once. I can make also 2 consecutive calls to _printy and it works. But the code breaks with the reverse ...
0votes
1answer
34views

Increment a variable in nasm assembly

I am making a OS in 32-bit protected mode assembly and need to increment (or decrement) a variable. My variable is defined like this: Var: db 1 and I am trying to increment it like this: mov ebx, [...
0votes
0answers
24views

Asking CPU or kernel for attention?

I've noticed that commonly if I ran: while { start = now() pseudoFunction() end = now() } will run N x faster than (faster means duration of end - start) while { start ...
-1votes
0answers
15views

Intel RealSense L515 dataset for object detection

I am finding LiDAR dataset from L515 Sensor. I have to object detection. Is there any other dataset from L515 for object detection?
0votes
1answer
32views

QEMU call goes to wrong address

I've been working on a small osdev project. So far i've gotten to running C code with A20, GDT, protected mode (32-bit) and disk loading, but function calls are not working. I've confirmed the actual ...
2votes
1answer
26views

Why do I get an illegal instruction using __builtin_ia32_wrfsbase64 when skylake has fsgsbase?

I'm running my code on a server with "Intel Xeon Processor (Skylake, IBRS)". I listed the cpu flags at the bottom. I got a core dump, ran it in gdb and saw the illegal instruction was ...
  • 57
1vote
2answers
59views

Stack address size on 80486, using gcc

GCC manual says that -m32 "generates code that runs on any i386 system". Assume I want to write a function that swaps the bytes of a 32 bit number: .text .globl SwapBytes .type ...
1vote
0answers
66views

JWASM with AVX cannot compile and AVX directive

I have a MASM program which uses AVX but I get errors when I try to assemble it with JWASM (a MASM clone). I couldn't find any directives to properly assemble the AVX instructions (like .mmx or .xmm). ...
  • 797
2votes
0answers
55views

Implementation of Stackful Coroutine in C++

Now I am trying to implement stackful coroutine in C++17 on Windows x64 OS, but, unfortunately, I have encountered the problem: I can't throw exception in my coroutine, if I do so, the program is ...
2votes
0answers
89views

How to force C code to use a specific Intel instruction?

I am calculating Entropy for a specific array of bytes. since I must calculate log2 of some numbers. Here is part of the code for (i = 0; i < ENTROPY_ARRAY_SIZE; i++) { if (entropy[i] == 0) ...
1vote
1answer
56views

ld makes wrong call instruction

I'm trying to make a small kernel, so far i've gotten to protected mode with A20 enabled, GDT loaded and kernel loaded from disk. In C i've tried function calls, but for some reason ld makes a wrong ...
0votes
0answers
26views

usage of $ to address memory in nasm (x8664)

I have tried to get this code to work all day, and finally I managed. I am looping through an array and computing the sum (I know pretty miserable), and everything seems to work, feel free to make ...
0votes
0answers
27views

Why do I have 35% of my cpu stalled-cycles-backend when only using 200MiB of ram?

I'm trying to profile my app and make it faster but I'm unsure if its hardware/memory bound or if I can improve it. I use perf stat to profile my app and saw 35% next to stalled-cycles-backend. The ...
  • 57
0votes
1answer
63views

Weird behavior when dereferencing pointer that points to the address of an instruction

I'm doing some reverse engineering on a ELF 32-Bit executable. Here is the code of the .text section : 08048080 <.text>: 8048080: b8 04 00 00 00 mov eax,0x4 8048085: bb 01 00 ...
7votes
0answers
88views

MSVC generating unnecessary complicated instructions

While benchmarking code involving std::optional<double>, I noticed that the code MSVC generates runs at roughly half the speed compared to the one produced by clang or gcc. After spending some ...
2votes
2answers
88views

How to convert Intel Assembly C to AT&T C++

I trying to convert function "__cpuid" from С language to C++. I have a problem that the g++ compiler does not work with Intel assembler. I'm trying to translate this code: __asm { ...
0votes
0answers
20views

How to speed up the Aarch64 Android Emulator on X86 ubuntu 18.04

I am trying to implement my tools on an APK, which can be only installed on arm-v8a or armeabi devices, so when I download the Android-23 image with arm-v8a on my ubuntu and run it, it acts really ...
1vote
2answers
45views

SSE interleave/merge/combine 2 vectors using a mask, per-element conditional move?

Essentially i am trying to implement a ternary-like operation on 2 SSE (__m128) vectors. The mask is another __m128 vector obtained from _mm_cmplt_ps. What i want to achieve is to select element of ...
0votes
0answers
23views

Problem to load Qt dependencies for python after load Intel oneAPI environment

I'm trying to automate some tasks for my phd research. Basically I have a .bat script does the following: Activates the Intel OneAPI environment to compile some C code in order to run some numerical ...
5votes
1answer
83views

Why does gcc implement fmin and fmax in three different ways?

I have a few routines here that all do the same thing: they clamp a float to the range [0,65535]. What surprises me is that the compiler (gcc -O3) uses three, count 'em, three different ways to ...
-3votes
0answers
20views

Quartus DE1-SoC UART PIN Assignment Fitter Error

I am trying to use the UART port on my DE1-SoC. From the datasheet, I know that HPS_UART_RX is at PIN_B25 and HPS_UART_TX is at PIN_C25 which are the 3.3V HPS UART Receiver and Transmitter ...
1vote
1answer
42views

split function in x86 asm (loop over characters of a string)

So, I need split a string character by character in JavaScript it will be like this let text ="abcde"; for (var i = 0; i < text.lenght; i++); { console.log(text.charAt(i)); } but ...
0votes
1answer
41views

C++-Fortran mixcompile unresolved externals

I am trying to make a fortran-C++ mix compile project. I use the case to test how the visual studio 2017 compiler link the intel fortran .lib file. Here is the example code: Fortran code here: !DEC$ ...
0votes
0answers
24views

SYSCALL and SYSRET Developer Setup

Am I correct in understanding that syscall relates to some register or block of memory that contains a jump table which can be setup by the kernel developer? And then sysret just returns you to the ...
1vote
0answers
25views

mkl_sparse_d_mv is between +25% to -50% performant than -O3 intel auto-vectorisation on Xeon Phi

Using Intel MKL's mkl_sparse_d_mv function on our physcs solver to perform a sparse matrix-vector multiplication yields a speedup of between -50% and +25% depending on the sparse matrix used on each ...
0votes
0answers
47views

Iteration over 2d matrix generates more cache references in perf after switching order of indices

i came across this GitHub repo, which links to this blog post about cache aware programming. It compares the number of cache-references and cache-misses in perf output depending on the spread of data ...
  • 1,031
-2votes
1answer
36views

How can I solve error LNK2019 with GLFW in librealsense example code project "capture" for c++

I've looked and tried the solutions provided in the many other posts regarding my issue but none of them worked for me. So I've been working on a project that requires the use of librealsense, a ...
0votes
0answers
38views

Trying to implement 2d array addition. in DPC++

I am learning dpc++ and trying to implement 2d array matrix program. I am stuck in between the program. Please check the blow code and support me. Need help. #include<CL/sycl.hpp> #include<...
0votes
0answers
47views

when writing 64bit reverse shell in assembly got stuck at createrprocessA api

hello i am writing windows 64bit reverse shell in assembly and after gett connected to the targetmachine ip, i want to create process to spwan a shell, fistly i try to write startinfo struct for ...
1vote
0answers
20views

How can I debug a x86 ELF application in gdb on a M1 Macbook?

I am trying to debug a x86 ELF on a M1 Macbook. My current setup is to use qemu in a Linux VM, but qemu does not implement ptrace, which makes debugging cross arch binaries significantly harder (no ...
2votes
1answer
65views

Why does Clang add extra FMA instructions?

#include <immintrin.h> __m256 mult(__m256 num) { return 278*num/(num+1400); } .LCPI0_0: .long 0x438b0000 # float 278 .LCPI0_1: .long 0x44af0000 ...
-1votes
1answer
43views

CR0 contains PE/PG flags right upon the Linux Kernel startup

I'm using GNU GRUB version 2.04 bootloader and Linux Kernel 5.19-rc2 I'm debugging the Linux Kernel initialization and expected that right upon the Kernel startup the CPU should be in real mode. ...
0votes
1answer
52views

Intel intrinsics: vector comparison result to array of bool conversion

I have several functions used to compare floating-point math vectors that fill an array of booleans (with result of each comparison). Currently, i am comparing them element-by-element, however i would ...
0votes
0answers
57views

how to vectorize arrow::compute::Take?

I have an array of large size input_array and an array of offsets take_array. I want to return the elements with those offsets very fast. Can I vectorize it for the arrow array? If so, how? arrow::...
-2votes
1answer
55views

Why is the difference between the addresses of int and a char is 16?

I declared an integer (32bits) and a char (8), and I know that a computer address is 1 byte at a time, so why is the difference between those two addresses is 16 and not let's say 40? if each memory ...
2votes
1answer
62views

Int argument passed from C to Assembly loses its sign

I am trying to implement a sign function in assembly (64 bit) and then calling it from C. My function takes in one argument: an unsigned int and returns an int, which can be -1 (if the argument is ...
1vote
0answers
43views

why does executing `cmpq` instruction produce `signal SIGTTIN, Stopped`?

I'm reading Programming from the Ground Up, and read the Chapter 5's Using Files in a Program., code in Page 63. I tried to let this 32bit code convert to 64bit code. Below is my code. # system call ...
1vote
1answer
64views

How to combine fortran project with C++ project?

I am working on a numerical project with C++, but it will use several fortran subroutines in another fortran project. The fortran project has head files and multiple subroutine files. The file ...
0votes
1answer
39views

Is there any difference in the multiple encodings of the same x64 instruction?

I am doing some experimentations on x64 assembly instructions, using the Miasm framework. Consider the snippet below, where I disassemble and reassemble the bytecode of LEA RAX, [RIP + 1]: from miasm....
2votes
0answers
25views

RAPL on AWS cloud VM? No intel_rapl and related files in directory /sys/class/powercapp

I want to use the Intel RAPL interface to record the energy consumption on CPUs. I am using two servers: an AWS EC2 instance and a physical machine. However, I found that there is no powercap ...
0votes
0answers
60views

Learning assembly. how to make code faster

I started to learn assembly some days ago and i write my first ever piece of code using user input, string functions, passing arguments by stack or by register etc... I have some questions. Do you ...
0votes
0answers
29views

Switching macro into a function

So I have a macro code from help of the teacher (this worked previously) %macro aBin2int 2 push rbp mov rbp, rsp sub rsp, 8 push rcx push rdx mov dword [%2], 0 mov ...
1vote
1answer
62views

How to matrix-multiply two sparse SciPy matrices and produce a dense Numpy array efficiently?

I'd like to matrix-multiply two sparse SciPy matrices. However, the result is not sparse, so I'd like to store it as a NumPy array. Is it possible to do this efficiently, that is without creating a &...
  • 10.9k
1vote
1answer
24views

how to merge multiple arrays to one array in masm32

I want to concatenate array1 array2 ... to one array how is fast way: .data array1 db 0CFh, 0C2h, 0ABh, 01Bh, 0C1h, 007h, 0F7h, 06Dh, 0DAh, 0F2h, 0DCh, 03Ch....; about 2000 dup ... array2 ...
3votes
1answer
64views

Linux Kernel function memblock_alloc_range_nid is not presented in the address space

I'm trying to debug physical memory allocation to understand what part of the Linux Kernel use memblock_alloc_range_nid on x86-64 and how. I'm running the latest Linux Kernel 5.19-rc2 built from ...
2votes
3answers
64views

Fortran in Visual Studio 2022

I installed Visual Studio 2022. I then installed Intel OneAPI Base Toolkit followed by the HPC toolkit. Everything seems fine and I am able to create a Fortran project, but I am unable to run any code....


153050per page